Bug 2106643 - Fix Cinder's pacemaker services restart due to config changes
Summary: Fix Cinder's pacemaker services restart due to config changes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 17.0 (Wallaby)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 17.0
Assignee: Alan Bishop
QA Contact: Tzach Shefi
URL:
Whiteboard:
Depends On:
Blocks: 2106647
TreeView+ depends on / blocked
 
Reported: 2022-07-13 08:14 UTC by Tzach Shefi
Modified: 2022-09-21 12:24 UTC (History)
1 user (show)

Fixed In Version: openstack-tripleo-heat-templates-14.3.1-0.20220718160751.feca772.el9ost
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2106647 (view as bug list)
Environment:
Last Closed: 2022-09-21 12:23:49 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1981591 0 None None None 2022-07-13 14:38:52 UTC
OpenStack gerrit 849710 0 None MERGED Fix restarting cinder HA services on config change 2022-07-18 15:58:04 UTC
OpenStack gerrit 850106 0 None MERGED Fix restarting cinder HA services on config change 2022-07-19 12:54:55 UTC
Red Hat Issue Tracker OSP-17590 0 None None None 2022-07-13 08:24:06 UTC
Red Hat Product Errata RHEA-2022:6543 0 None None None 2022-09-21 12:24:21 UTC

Description Tzach Shefi 2022-07-13 08:14:00 UTC
Description of problem: Adding a second Ceph RBD pool to Cinder via overcloud update may fail to restart/update cinder.conf on Cinder volume docker, causing the absence of expected new Cinder backend. 

If we add the second pool yaml on initial overcloud deployment, the pool/backend is always added as expected. 

Version-Release number of selected component (if applicable):
openstack-tripleo-heat-templates-14.3.1-0.20220628111342.7c969c5.el9ost.noarch

Also hits 16.2, will duplicate bz, but shouldn't happen on 16.1

How reproducible:
Every time,
Tested at least twice on osp17 and 16.2. 

Steps to Reproduce:
1. On an existing deployment, add a yaml which should create/add a new RBD pool for example:
Cat extra_templates.yaml
parameter_defaults:
    CephPools:
      - {"name": vol2, "pg_num": 32, "pgp_num": 32, "application": rbd} 
    CinderRbdExtraPools:
      - vol2

2. After overcloud update completes, 
we confirm on the Ceph node the new pool is created. 
Cinder.conf also includes the new pool/backend.
However running #cinder service-list the new pool/backend isn't listed neither as up or down, it's missing all together. 


3. If you then restart C-vol docker, the backend will suddenly show on cinder service-list as expected with an "up" statues, after which a volume can now be successfully created on it.

Quoting Alan's RCA from an email, 
the issue happen only when cinder-backup is enabled. 
Tripleo-ansible role that handles restarting pacemaker services isn't aware there are two separate pcmk cinder services to consider. It detects that cinder.conf has changed, and uses that information to restart the cinder-backup service. But then when it checks again for the cinder-volume service, it reaches the wrong conclusion. It doesn't restart the cinder-volume service because restarting c-bak tricked it into thinking c-vol was OK.


Actual results:
The new pool is created on Ceph side, 
but C-vol doesn't notice the change, thus the second backend isn't added. 
Until we restart c-vol docker, which then pulls in the config changes, adding the missing second backend.


Expected results:
The second Ceph pool, should show up as a second Cinder backend pool,
without having to manually restart c-vol docker. 

Additional info:

Comment 3 Tzach Shefi 2022-07-25 13:25:43 UTC
Verified on:
openstack-tripleo-heat-templates-14.3.1-0.20220719171711.feca772.el9ost.noarch.rpm 


Deployed a basic Ceph job, which resulted in the single default tripleo_ceph backend:
| cinder-volume    | hostgroup@tripleo_ceph | nova | enabled | up    | 2022-07-25T12:46:21.000000 | -               |


Created a new yaml, added it to overcloud_deploy.sh and executed/updated the OC:

(overcloud) [stack@undercloud-0 ~]$ cat secondcephpool.yaml 
parameter_defaults:
    CephPools:
      - {"name": vol2, "pg_num": 32, "pgp_num": 32, "application": rbd} 
    CinderRbdExtraPools:
      - vol2


The result this time around is an expected and pleasing two back ends:

 [stack@undercloud-0 ~]$ cinder service-list
+------------------+-----------------------------+------+---------+-------+----------------------------+-----------------+
| Binary           | Host                        | Zone | Status  | State | Updated_at                 | Disabled Reason |
+------------------+-----------------------------+------+---------+-------+----------------------------+-----------------+
..
| cinder-volume    | hostgroup@tripleo_ceph      | nova | enabled | up    | 2022-07-25T13:21:12.000000 | -               |
| cinder-volume    | hostgroup@tripleo_ceph_vol2 | nova | enabled | up    | 2022-07-25T13:21:12.000000 | -               |--> Yay we got the second backend. 
+------------------+-----------------------------+------+---------+-------+----------------------------+-----------------+

Works as expected, a very verified bz.

Comment 7 errata-xmlrpc 2022-09-21 12:23:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Release of components for Red Hat OpenStack Platform 17.0 (Wallaby)), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2022:6543


Note You need to log in before you can comment on or make changes to this bug.