Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2167954

Summary: cinder-backup in active/active mode only works on the controller where cinder-volume is running
Product: Red Hat OpenStack Reporter: Luigi Toscano <ltoscano>
Component: openstack-tripleo-commonAssignee: Alan Bishop <abishop>
Status: CLOSED CURRENTRELEASE QA Contact: Luigi Toscano <ltoscano>
Severity: high Docs Contact:
Priority: high    
Version: 17.1 (Wallaby)CC: mburns, slinaber
Target Milestone: betaKeywords: Triaged
Target Release: 17.1   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-03 16:06:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1666804    

Description Luigi Toscano 2023-02-07 19:10:45 UTC
Description of problem:
When deploying cinder-backup in active-active mode by using the cinder-backup-active-active.yaml environment file, the deployment is successful, but then only one of the three cinder-backup service is up:


(overcloud) [stack@undercloud-0 ~]$ cinder service-list
+------------------+--------------------------+------+---------+-------+----------------------------+-----------------+                                                                     
| Binary           | Host                     | Zone | Status  | State | Updated_at                 | Disabled Reason |                                                                     
+------------------+--------------------------+------+---------+-------+----------------------------+-----------------+                                                                     
| cinder-backup    | controller-0             | nova | enabled | down  | 2023-02-06T20:58:30.000000 | -               |                                                                     
| cinder-backup    | controller-1             | nova | enabled | down  | 2023-02-06T20:58:16.000000 | -               |                                                                     
| cinder-backup    | controller-2             | nova | enabled | up    | 2023-02-06T21:31:40.000000 | -               |                                                                     
| cinder-scheduler | controller-0             | nova | enabled | up    | 2023-02-06T21:31:42.000000 | -               |                                                                     
| cinder-scheduler | controller-1             | nova | enabled | up    | 2023-02-06T21:31:36.000000 | -               |                                                                     
| cinder-scheduler | controller-2             | nova | enabled | up    | 2023-02-06T21:31:38.000000 | -               |                                                                     
| cinder-volume    | hostgroup@tripleo_netapp | nova | enabled | up    | 2023-02-06T21:31:44.000000 | -               |                                                                     
+------------------+--------------------------+------+---------+-------+----------------------------+-----------------+


Looking at the logs of the services, the failing ones contain:

    2023-02-07 01:48:18.068 33350 INFO cinder.service [-] Starting cinder-backup node (version 18.2.2)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service [-] Error starting thread.: tooz.coordination.ToozConnectionError: [Errno 13] Permission denied: '/var/lib/cinder/groups'
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service Traceback (most recent call last):
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/tooz/drivers/file.py", line 277, in _start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     fileutils.ensure_tree(a_dir)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/oslo_utils/fileutils.py", line 44, in ensure_tree
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     os.makedirs(path, mode)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib64/python3.9/os.py", line 225, in makedirs
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     mkdir(name, mode)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service PermissionError: [Errno 13] Permission denied: '/var/lib/cinder/groups'
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service During handling of the above exception, another exception occurred:
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service Traceback (most recent call last):
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/oslo_service/service.py", line 807, in run_service
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     service.start()
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/cinder/service.py", line 220, in start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     coordination.COORDINATOR.start()
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/cinder/coordination.py", line 67, in start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     self.coordinator.start(start_heart=True)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/tooz/coordination.py", line 689, in start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     super(CoordinationDriverWithExecutor, self).start(start_heart)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/tooz/coordination.py", line 426, in start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     self._start()
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/tooz/drivers/file.py", line 279, in _start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     raise coordination.ToozConnectionError(e)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service tooz.coordination.ToozConnectionError: [Errno 13] Permission denied: '/var/lib/cinder/groups'
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service


This happens regardless of the cinder-backup backend.

An initial discussion with the developers (thanks Alan) suggests this may seem the same or very similar to an old cinder-volume issue, https://bugs.launchpad.net/tripleo/+bug/1908750

Version-Release number of selected component (if applicable):
python3-tripleo-common-15.4.1-1.20230119220943.4e21638.el9ost.noarch
openstack-tripleo-common-containers-15.4.1-1.20230119220943.4e21638.el9ost.noarch
openstack-tripleo-common-15.4.1-1.20230119220943.4e21638.el9ost.noarch

Comment 1 Alan Bishop 2023-02-28 17:55:24 UTC
No doc update required, as this is associated with a new feature in RHOSP.

Comment 2 Luigi Toscano 2023-03-03 16:06:10 UTC
With the last set of available packages, the issue is fixed

(overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-common
python3-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost.noarch
openstack-tripleo-common-containers-15.4.1-1.20230223221300.d447618.el9ost.noarch
openstack-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost.noarch
(overcloud) [stack@undercloud-0 ~]$ cinder service-list
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+
| Binary           | Host                   | Zone | Status  | State | Updated_at                 | Disabled Reason |
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+
| cinder-backup    | controller-0           | nova | enabled | up    | 2023-03-03T10:21:53.000000 | -               |
| cinder-backup    | controller-1           | nova | enabled | up    | 2023-03-03T10:21:58.000000 | -               |
| cinder-backup    | controller-2           | nova | enabled | up    | 2023-03-03T10:21:52.000000 | -               |
| cinder-scheduler | controller-0           | nova | enabled | up    | 2023-03-03T10:21:55.000000 | -               |
| cinder-scheduler | controller-1           | nova | enabled | up    | 2023-03-03T10:21:56.000000 | -               |
| cinder-scheduler | controller-2           | nova | enabled | up    | 2023-03-03T10:22:01.000000 | -               |
| cinder-volume    | hostgroup@tripleo_ceph | nova | enabled | up    | 2023-03-03T10:21:52.000000 | -               |
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+

As this feature was never released and this is a bug that was found and fixed in the pre-release development/testing-rinse-and-repeat cycle, which means this is not a regression, I'm going to close this bug.