Description of problem: When deploying cinder-backup in active-active mode by using the cinder-backup-active-active.yaml environment file, the deployment is successful, but then only one of the three cinder-backup service is up: (overcloud) [stack@undercloud-0 ~]$ cinder service-list +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ | cinder-backup | controller-0 | nova | enabled | down | 2023-02-06T20:58:30.000000 | - | | cinder-backup | controller-1 | nova | enabled | down | 2023-02-06T20:58:16.000000 | - | | cinder-backup | controller-2 | nova | enabled | up | 2023-02-06T21:31:40.000000 | - | | cinder-scheduler | controller-0 | nova | enabled | up | 2023-02-06T21:31:42.000000 | - | | cinder-scheduler | controller-1 | nova | enabled | up | 2023-02-06T21:31:36.000000 | - | | cinder-scheduler | controller-2 | nova | enabled | up | 2023-02-06T21:31:38.000000 | - | | cinder-volume | hostgroup@tripleo_netapp | nova | enabled | up | 2023-02-06T21:31:44.000000 | - | +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ Looking at the logs of the services, the failing ones contain: 2023-02-07 01:48:18.068 33350 INFO cinder.service [-] Starting cinder-backup node (version 18.2.2) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service [-] Error starting thread.: tooz.coordination.ToozConnectionError: [Errno 13] Permission denied: '/var/lib/cinder/groups' 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service Traceback (most recent call last): 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/tooz/drivers/file.py", line 277, in _start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service fileutils.ensure_tree(a_dir) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/oslo_utils/fileutils.py", line 44, in ensure_tree 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service os.makedirs(path, mode) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib64/python3.9/os.py", line 225, in makedirs 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service mkdir(name, mode) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service PermissionError: [Errno 13] Permission denied: '/var/lib/cinder/groups' 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service Traceback (most recent call last): 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/oslo_service/service.py", line 807, in run_service 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service service.start() 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/cinder/service.py", line 220, in start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service coordination.COORDINATOR.start() 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/cinder/coordination.py", line 67, in start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service self.coordinator.start(start_heart=True) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/tooz/coordination.py", line 689, in start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service super(CoordinationDriverWithExecutor, self).start(start_heart) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/tooz/coordination.py", line 426, in start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service self._start() 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/tooz/drivers/file.py", line 279, in _start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service raise coordination.ToozConnectionError(e) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service tooz.coordination.ToozConnectionError: [Errno 13] Permission denied: '/var/lib/cinder/groups' 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service This happens regardless of the cinder-backup backend. An initial discussion with the developers (thanks Alan) suggests this may seem the same or very similar to an old cinder-volume issue, https://bugs.launchpad.net/tripleo/+bug/1908750 Version-Release number of selected component (if applicable): python3-tripleo-common-15.4.1-1.20230119220943.4e21638.el9ost.noarch openstack-tripleo-common-containers-15.4.1-1.20230119220943.4e21638.el9ost.noarch openstack-tripleo-common-15.4.1-1.20230119220943.4e21638.el9ost.noarch
No doc update required, as this is associated with a new feature in RHOSP.
With the last set of available packages, the issue is fixed (overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-common python3-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost.noarch openstack-tripleo-common-containers-15.4.1-1.20230223221300.d447618.el9ost.noarch openstack-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost.noarch (overcloud) [stack@undercloud-0 ~]$ cinder service-list +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ | cinder-backup | controller-0 | nova | enabled | up | 2023-03-03T10:21:53.000000 | - | | cinder-backup | controller-1 | nova | enabled | up | 2023-03-03T10:21:58.000000 | - | | cinder-backup | controller-2 | nova | enabled | up | 2023-03-03T10:21:52.000000 | - | | cinder-scheduler | controller-0 | nova | enabled | up | 2023-03-03T10:21:55.000000 | - | | cinder-scheduler | controller-1 | nova | enabled | up | 2023-03-03T10:21:56.000000 | - | | cinder-scheduler | controller-2 | nova | enabled | up | 2023-03-03T10:22:01.000000 | - | | cinder-volume | hostgroup@tripleo_ceph | nova | enabled | up | 2023-03-03T10:21:52.000000 | - | +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ As this feature was never released and this is a bug that was found and fixed in the pre-release development/testing-rinse-and-repeat cycle, which means this is not a regression, I'm going to close this bug.