unfortunately don't have my OSP16 deployment any longer, but I tested it and hit same issue, I think the version I was using was: openstack-tripleo-heat-templates-11.6.1-2.20220409014852.el8ost.noarch.rpm +++ This bug was initially created as a clone of Bug #2106643 +++ Description of problem: Adding a second Ceph RBD pool to Cinder via overcloud update may fail to restart/update cinder.conf on Cinder volume docker, causing the absence of expected new Cinder backend. If we add the second pool yaml on initial overcloud deployment, the pool/backend is always added as expected. Version-Release number of selected component (if applicable): openstack-tripleo-heat-templates-14.3.1-0.20220628111342.7c969c5.el9ost.noarch Also hits 16.2, will duplicate bz, but shouldn't happen on 16.1 How reproducible: Every time, Tested at least twice on osp17 and 16.2. Steps to Reproduce: 1. On an existing deployment, add a yaml which should create/add a new RBD pool for example: Cat extra_templates.yaml parameter_defaults: CephPools: - {"name": vol2, "pg_num": 32, "pgp_num": 32, "application": rbd} CinderRbdExtraPools: - vol2 2. After overcloud update completes, we confirm on the Ceph node the new pool is created. Cinder.conf also includes the new pool/backend. However running #cinder service-list the new pool/backend isn't listed neither as up or down, it's missing all together. 3. If you then restart C-vol docker, the backend will suddenly show on cinder service-list as expected with an "up" statues, after which a volume can now be successfully created on it. Quoting Alan's RCA from an email, the issue happen only when cinder-backup is enabled. Tripleo-ansible role that handles restarting pacemaker services isn't aware there are two separate pcmk cinder services to consider. It detects that cinder.conf has changed, and uses that information to restart the cinder-backup service. But then when it checks again for the cinder-volume service, it reaches the wrong conclusion. It doesn't restart the cinder-volume service because restarting c-bak tricked it into thinking c-vol was OK. Actual results: The new pool is created on Ceph side, but C-vol doesn't notice the change, thus the second backend isn't added. Until we restart c-vol docker, which then pulls in the config changes, adding the missing second backend. Expected results: The second Ceph pool, should show up as a second Cinder backend pool, without having to manually restart c-vol docker. Additional info:
Verified on: openstack-tripleo-heat-templates-11.6.1-2.20220821010130.b1e9bfe.el8ost.noarch Deployed a basic Ceph system, it just so happens that the job I used had created two backend: (overcloud) [stack@undercloud-0 ~]$ cinder service-list +------------------+---------------------------------+------+---------+-------+----------------------------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +------------------+---------------------------------+------+---------+-------+----------------------------+-----------------+ | cinder-backup | controller-1 | nova | enabled | up | 2022-09-04T12:56:02.000000 | - | .. | cinder-volume | hostgroup@tripleo_ceph | nova | enabled | up | 2022-09-04T12:56:03.000000 | - | | cinder-volume | hostgroup@tripleo_ceph_fastpool | nova | enabled | up | 2022-09-04T12:56:03.000000 | - | Again I created a yaml to add a third Cinder ceph pool/backend (overcloud) [stack@undercloud-0 ~]$ cat extra_templates.yaml parameter_defaults: CephPools: - {"name": vol2, "pg_num": 32, "pgp_num": 32, "application": rbd} CinderRbdExtraPools: - vol2 Added the above yaml to the overcloud_deploy.sh command and updated the overcloud. As expected the result was that the new Ceph third pool/backend now exists: (overcloud) [stack@undercloud-0 ~]$ cinder service-list +------------------+---------------------------------+------+---------+-------+----------------------------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +------------------+---------------------------------+------+---------+-------+----------------------------+-----------------+ | cinder-backup | controller-1 | nova | enabled | up | 2022-09-04T14:30:03.000000 | - | .. | cinder-volume | hostgroup@tripleo_ceph | nova | enabled | up | 2022-09-04T14:30:04.000000 | - | | cinder-volume | hostgroup@tripleo_ceph_fastpool | nova | enabled | down | 2022-09-04T13:36:24.000000 | - | | cinder-volume | hostgroup@tripleo_ceph_vol2 | nova | enabled | up | 2022-09-04T14:30:04.000000 | - | Before the fix I had to manually restart c-vol so as to get the added pool/backend to show up. Right now the new backend tripleo_ceph_vol2 was created/added and is up without having to manually restart c-vol, good to verify. I fear there is a new issue/bug here which might explain why tripleo_ceph_fastpool is now down, as it was up before the update, I'll consult with dev about it.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Release of components for Red Hat OpenStack Platform 16.2.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:8794