Bug 2167954
| Summary: | cinder-backup in active/active mode only works on the controller where cinder-volume is running | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Luigi Toscano <ltoscano> |
| Component: | openstack-tripleo-common | Assignee: | Alan Bishop <abishop> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Luigi Toscano <ltoscano> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 17.1 (Wallaby) | CC: | mburns, slinaber |
| Target Milestone: | beta | Keywords: | Triaged |
| Target Release: | 17.1 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2023-03-03 16:06:10 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1666804 | ||
No doc update required, as this is associated with a new feature in RHOSP. With the last set of available packages, the issue is fixed (overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-common python3-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost.noarch openstack-tripleo-common-containers-15.4.1-1.20230223221300.d447618.el9ost.noarch openstack-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost.noarch (overcloud) [stack@undercloud-0 ~]$ cinder service-list +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ | cinder-backup | controller-0 | nova | enabled | up | 2023-03-03T10:21:53.000000 | - | | cinder-backup | controller-1 | nova | enabled | up | 2023-03-03T10:21:58.000000 | - | | cinder-backup | controller-2 | nova | enabled | up | 2023-03-03T10:21:52.000000 | - | | cinder-scheduler | controller-0 | nova | enabled | up | 2023-03-03T10:21:55.000000 | - | | cinder-scheduler | controller-1 | nova | enabled | up | 2023-03-03T10:21:56.000000 | - | | cinder-scheduler | controller-2 | nova | enabled | up | 2023-03-03T10:22:01.000000 | - | | cinder-volume | hostgroup@tripleo_ceph | nova | enabled | up | 2023-03-03T10:21:52.000000 | - | +------------------+------------------------+------+---------+-------+----------------------------+-----------------+ As this feature was never released and this is a bug that was found and fixed in the pre-release development/testing-rinse-and-repeat cycle, which means this is not a regression, I'm going to close this bug. |
Description of problem: When deploying cinder-backup in active-active mode by using the cinder-backup-active-active.yaml environment file, the deployment is successful, but then only one of the three cinder-backup service is up: (overcloud) [stack@undercloud-0 ~]$ cinder service-list +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ | cinder-backup | controller-0 | nova | enabled | down | 2023-02-06T20:58:30.000000 | - | | cinder-backup | controller-1 | nova | enabled | down | 2023-02-06T20:58:16.000000 | - | | cinder-backup | controller-2 | nova | enabled | up | 2023-02-06T21:31:40.000000 | - | | cinder-scheduler | controller-0 | nova | enabled | up | 2023-02-06T21:31:42.000000 | - | | cinder-scheduler | controller-1 | nova | enabled | up | 2023-02-06T21:31:36.000000 | - | | cinder-scheduler | controller-2 | nova | enabled | up | 2023-02-06T21:31:38.000000 | - | | cinder-volume | hostgroup@tripleo_netapp | nova | enabled | up | 2023-02-06T21:31:44.000000 | - | +------------------+--------------------------+------+---------+-------+----------------------------+-----------------+ Looking at the logs of the services, the failing ones contain: 2023-02-07 01:48:18.068 33350 INFO cinder.service [-] Starting cinder-backup node (version 18.2.2) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service [-] Error starting thread.: tooz.coordination.ToozConnectionError: [Errno 13] Permission denied: '/var/lib/cinder/groups' 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service Traceback (most recent call last): 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/tooz/drivers/file.py", line 277, in _start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service fileutils.ensure_tree(a_dir) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/oslo_utils/fileutils.py", line 44, in ensure_tree 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service os.makedirs(path, mode) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib64/python3.9/os.py", line 225, in makedirs 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service mkdir(name, mode) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service PermissionError: [Errno 13] Permission denied: '/var/lib/cinder/groups' 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service During handling of the above exception, another exception occurred: 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service Traceback (most recent call last): 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/oslo_service/service.py", line 807, in run_service 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service service.start() 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/cinder/service.py", line 220, in start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service coordination.COORDINATOR.start() 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/cinder/coordination.py", line 67, in start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service self.coordinator.start(start_heart=True) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/tooz/coordination.py", line 689, in start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service super(CoordinationDriverWithExecutor, self).start(start_heart) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/tooz/coordination.py", line 426, in start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service self._start() 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service File "/usr/lib/python3.9/site-packages/tooz/drivers/file.py", line 279, in _start 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service raise coordination.ToozConnectionError(e) 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service tooz.coordination.ToozConnectionError: [Errno 13] Permission denied: '/var/lib/cinder/groups' 2023-02-07 01:48:18.098 33350 ERROR oslo_service.service This happens regardless of the cinder-backup backend. An initial discussion with the developers (thanks Alan) suggests this may seem the same or very similar to an old cinder-volume issue, https://bugs.launchpad.net/tripleo/+bug/1908750 Version-Release number of selected component (if applicable): python3-tripleo-common-15.4.1-1.20230119220943.4e21638.el9ost.noarch openstack-tripleo-common-containers-15.4.1-1.20230119220943.4e21638.el9ost.noarch openstack-tripleo-common-15.4.1-1.20230119220943.4e21638.el9ost.noarch