Bug 2167954 - cinder-backup in active/active mode only works on the controller where cinder-volume is running
Summary: cinder-backup in active/active mode only works on the controller where cinder...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 17.1 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 17.1
Assignee: Alan Bishop
QA Contact: Luigi Toscano
URL:
Whiteboard:
Depends On:
Blocks: 1666804
TreeView+ depends on / blocked
 
Reported: 2023-02-07 19:10 UTC by Luigi Toscano
Modified: 2023-03-03 16:06 UTC (History)
2 users (show)

Fixed In Version: openstack-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-03 16:06:10 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
OpenStack gerrit 874527 0 None MERGED TCIB: Add cinder-backup extend_start.sh script 2023-03-02 14:32:39 UTC
Red Hat Issue Tracker OSP-22117 0 None None None 2023-02-07 19:11:43 UTC

Description Luigi Toscano 2023-02-07 19:10:45 UTC
Description of problem:
When deploying cinder-backup in active-active mode by using the cinder-backup-active-active.yaml environment file, the deployment is successful, but then only one of the three cinder-backup service is up:


(overcloud) [stack@undercloud-0 ~]$ cinder service-list
+------------------+--------------------------+------+---------+-------+----------------------------+-----------------+                                                                     
| Binary           | Host                     | Zone | Status  | State | Updated_at                 | Disabled Reason |                                                                     
+------------------+--------------------------+------+---------+-------+----------------------------+-----------------+                                                                     
| cinder-backup    | controller-0             | nova | enabled | down  | 2023-02-06T20:58:30.000000 | -               |                                                                     
| cinder-backup    | controller-1             | nova | enabled | down  | 2023-02-06T20:58:16.000000 | -               |                                                                     
| cinder-backup    | controller-2             | nova | enabled | up    | 2023-02-06T21:31:40.000000 | -               |                                                                     
| cinder-scheduler | controller-0             | nova | enabled | up    | 2023-02-06T21:31:42.000000 | -               |                                                                     
| cinder-scheduler | controller-1             | nova | enabled | up    | 2023-02-06T21:31:36.000000 | -               |                                                                     
| cinder-scheduler | controller-2             | nova | enabled | up    | 2023-02-06T21:31:38.000000 | -               |                                                                     
| cinder-volume    | hostgroup@tripleo_netapp | nova | enabled | up    | 2023-02-06T21:31:44.000000 | -               |                                                                     
+------------------+--------------------------+------+---------+-------+----------------------------+-----------------+


Looking at the logs of the services, the failing ones contain:

    2023-02-07 01:48:18.068 33350 INFO cinder.service [-] Starting cinder-backup node (version 18.2.2)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service [-] Error starting thread.: tooz.coordination.ToozConnectionError: [Errno 13] Permission denied: '/var/lib/cinder/groups'
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service Traceback (most recent call last):
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/tooz/drivers/file.py", line 277, in _start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     fileutils.ensure_tree(a_dir)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/oslo_utils/fileutils.py", line 44, in ensure_tree
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     os.makedirs(path, mode)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib64/python3.9/os.py", line 225, in makedirs
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     mkdir(name, mode)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service PermissionError: [Errno 13] Permission denied: '/var/lib/cinder/groups'
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service During handling of the above exception, another exception occurred:
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service Traceback (most recent call last):
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/oslo_service/service.py", line 807, in run_service
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     service.start()
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/cinder/service.py", line 220, in start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     coordination.COORDINATOR.start()
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/cinder/coordination.py", line 67, in start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     self.coordinator.start(start_heart=True)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/tooz/coordination.py", line 689, in start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     super(CoordinationDriverWithExecutor, self).start(start_heart)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/tooz/coordination.py", line 426, in start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     self._start()
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service   File "/usr/lib/python3.9/site-packages/tooz/drivers/file.py", line 279, in _start
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service     raise coordination.ToozConnectionError(e)
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service tooz.coordination.ToozConnectionError: [Errno 13] Permission denied: '/var/lib/cinder/groups'
    2023-02-07 01:48:18.098 33350 ERROR oslo_service.service


This happens regardless of the cinder-backup backend.

An initial discussion with the developers (thanks Alan) suggests this may seem the same or very similar to an old cinder-volume issue, https://bugs.launchpad.net/tripleo/+bug/1908750

Version-Release number of selected component (if applicable):
python3-tripleo-common-15.4.1-1.20230119220943.4e21638.el9ost.noarch
openstack-tripleo-common-containers-15.4.1-1.20230119220943.4e21638.el9ost.noarch
openstack-tripleo-common-15.4.1-1.20230119220943.4e21638.el9ost.noarch

Comment 1 Alan Bishop 2023-02-28 17:55:24 UTC
No doc update required, as this is associated with a new feature in RHOSP.

Comment 2 Luigi Toscano 2023-03-03 16:06:10 UTC
With the last set of available packages, the issue is fixed

(overcloud) [stack@undercloud-0 ~]$ rpm -qa | grep tripleo-common
python3-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost.noarch
openstack-tripleo-common-containers-15.4.1-1.20230223221300.d447618.el9ost.noarch
openstack-tripleo-common-15.4.1-1.20230223221300.d447618.el9ost.noarch
(overcloud) [stack@undercloud-0 ~]$ cinder service-list
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+
| Binary           | Host                   | Zone | Status  | State | Updated_at                 | Disabled Reason |
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+
| cinder-backup    | controller-0           | nova | enabled | up    | 2023-03-03T10:21:53.000000 | -               |
| cinder-backup    | controller-1           | nova | enabled | up    | 2023-03-03T10:21:58.000000 | -               |
| cinder-backup    | controller-2           | nova | enabled | up    | 2023-03-03T10:21:52.000000 | -               |
| cinder-scheduler | controller-0           | nova | enabled | up    | 2023-03-03T10:21:55.000000 | -               |
| cinder-scheduler | controller-1           | nova | enabled | up    | 2023-03-03T10:21:56.000000 | -               |
| cinder-scheduler | controller-2           | nova | enabled | up    | 2023-03-03T10:22:01.000000 | -               |
| cinder-volume    | hostgroup@tripleo_ceph | nova | enabled | up    | 2023-03-03T10:21:52.000000 | -               |
+------------------+------------------------+------+---------+-------+----------------------------+-----------------+

As this feature was never released and this is a bug that was found and fixed in the pre-release development/testing-rinse-and-repeat cycle, which means this is not a regression, I'm going to close this bug.


Note You need to log in before you can comment on or make changes to this bug.