1559105 – OC update does not set additional RBD Cinder backend on stack update

Bug 1559105 - OC update does not set additional RBD Cinder backend on stack update

Summary: OC update does not set additional RBD Cinder backend on stack update

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	13.0 (Queens)
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	low
Target Milestone:	z1
Target Release:	13.0 (Queens)
Assignee:	Michele Baldessari
QA Contact:	pkomarov
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1596942 (view as bug list)
Depends On:
Blocks:	1579514
TreeView+	depends on / blocked

Reported:	2018-03-21 17:12 UTC by Yogev Rabl
Modified:	2022-07-09 11:33 UTC (History)
CC List:	17 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-8.0.2-41.el7ost
Doc Type:	Bug Fix
Doc Text:	Rerunning an overcloud deploy command against an existing overcloud failed to trigger a restart of any pacemaker managed resource. For example, when adding a new service to haproxy, haproxy would not restart, rendering the newly configured service unavailable until a manual restart of the haproxy pacemaker resource. With this update, a configuration change of any pacemaker resource is detected, and the pacemaker resource automatically restarts. Any changes in the configuration of pacemaker managed resources is then reflected in the overcloud.
Clone Of:
Environment:
Last Closed:	2018-07-19 14:27:12 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1775196	None	None	None	2018-06-05 14:51:40 UTC
OpenStack gerrit	574263	None	MERGED	Introduce restart_bundle containers to detect config changes and restart pacemaker resources	2020-09-15 08:40:20 UTC
OpenStack gerrit	574264	None	MERGED	rerun *_init_bundles all the time	2020-09-15 08:40:19 UTC
Red Hat Bugzilla	1593757	urgent	CLOSED	Firewall rules for octavia-api are not created on UPDATE	2022-07-09 13:31:58 UTC
Red Hat Issue Tracker	OSP-17200	None	None	None	2022-07-09 11:33:35 UTC
Red Hat Product Errata	RHSA-2018:2214	None	None	None	2018-07-19 14:28:01 UTC

Internal Links: 1593757

Description Yogev Rabl 2018-03-21 17:12:56 UTC

Description of problem:
The update of the OC does not set additional RBD Cinder back end in cinder.conf. 
The deployment has ended successfully without any notification, warning or error

Version-Release number of selected component (if applicable):
puppet-tripleo-8.3.1-0.20180304033907.ed3285e.el7ost.noarch
openstack-tripleo-image-elements-8.0.0-0.20180304011935.e427c90.el7ost.noarch
openstack-tripleo-validations-8.3.1-0.20180304031640.d5546cd.el7ost.noarch
openstack-tripleo-puppet-elements-8.0.0-0.20180304005217.dabb361.el7ost.noarch
openstack-tripleo-heat-templates-8.0.0-0.20180304031146.6cd4184.el7ost.noarch
openstack-tripleo-common-8.5.1-0.20180304032202.e8d9da9.el7ost.noarch
openstack-tripleo-common-containers-8.5.1-0.20180304032202.e8d9da9.el7ost.noarch
python-tripleoclient-9.1.1-0.20180305094421.90727db.el7ost.noarch
ceph-ansible-3.1.0-0.1.beta3.el7.noarch

How reproducible:


Steps to Reproduce:
1. Deploy an overcloud with 1 Cinder RBD backend
2. Update the overcloud, with 1 additional Ceph pool and additional RBD back end


Actual results:
There is only 1 RBD back end set in /etc/cinder/cinder.conf on the cinder-volume docker container

Expected results:
There are two back ends, one for each pool (as set in the environment file) 

Additional info:

Comment 2 Alan Bishop 2018-03-21 19:02:41 UTC

I looked at Yogev's system, and on the controller I see the hiera data for this feature [1] _has_ been updated.

[root@controller-0 hieradata]# hiera -c /etc/puppet/hiera.yaml tripleo::profile::base::cinder::volume::rbd::cinder_rbd_extra_pools
["volumes2"]

However, /etc/ceph/ceph.client.openstack.keyring has not be updated, and the puppet-tripleo code that handles the feature [2] doesn't seem to have executed.

[1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/cinder-volume.yaml#L153
[2] https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/cinder/volume/rbd.pp#L76

Comment 3 John Fulton 2018-03-23 17:22:12 UTC

To reproduce:
- deploy OSP13
- add an additional pool in THT
- redeploy overcloud so that TripleO should apply the update described in THT; i.e. ceph-ansible should add the additional pool

Comment 4 Alan Bishop 2018-03-23 17:36:29 UTC

(In reply to John Fulton from comment #3)
> To reproduce:
> - deploy OSP13
> - add an additional pool in THT
> - redeploy overcloud so that TripleO should apply the update described in
> THT; i.e. ceph-ansible should add the additional pool

And just for completeness, puppet needs to configure the cinder backend. I mention this so we don't focus on just ceph-ansible. It looks to me like changes to the THT aren't propagating (ceph-ansible and puppet aren't being poked).

Comment 5 Giulio Fidente 2018-03-23 18:25:26 UTC

I think I reproduced the issue by adding just the following:

parameter_defaults:
  CinderRbdExtraPools: myotherpool

on the second 'openstack deploy' attempt. There are two separate issues:

1) ceph-ansible does not refresh the client.openstack keyring, see BZ #1560022

2) the cinder config data in /var/lib/config-data/cinder is refreshed but the cinder-volume container is not restarted; we can use this bug to track the container restart

Comment 6 James Slagle 2018-03-26 13:48:34 UTC

please attach /var/log/messages from the node running cinder-volume where you saw this issue occur

Comment 7 Martin André 2018-03-26 13:59:46 UTC

(In reply to Giulio Fidente from comment #5)
> 2) the cinder config data in /var/lib/config-data/cinder is refreshed but
> the cinder-volume container is not restarted; we can use this bug to track
> the container restart

At least in my environment the cinder-volume container mounts files from /var/lib/config-data/puppet-generated/cinder/, are these files also being updated on the second openstack deploy?

Comment 8 James Slagle 2018-03-26 14:59:26 UTC

it seems the issue here is that there is no mechanism in place to trigger pacemaker restarting of the bundle containers on config change.

this could be fixed in different ways, but first we need to decide on the right fix. is it appropriate/necessary to run the puppet code executed by the init bundle containers (such as cinder_volume_init_bundle) on every stack update?

If so, and it's indeed the case that re-executing that puppet would cause pacemaker to restart the affected containers on config change, then we could add a new option to paunch such as run_always which would mean to always delete the old container and run the container again.

Other possibilities would be to dummy mount /var/lib/config-data into the init bundle containers which would trigger TRIPLEO_CONFIG_HASH handling and force paunch to rerun the container. But, does that fully handle the scenario where perhaps something on the pacemaker config changed, and we'd need to reexecute the puppet code even if /var/lib/config-data is the same?

Comment 12 Harry Rybacki 2018-06-04 11:58:24 UTC

Hey Michele, can we expedite this bug? It's blocking another bug[1] which is an OSP13 blocker. QE just kicked back our build after failing to verify

[1] - https://bugzilla.redhat.com/show_bug.cgi?id=1579514

Comment 13 Michele Baldessari 2018-06-04 12:13:20 UTC

(Sorry am out this week) 

The fix for this is quite involved and complicated. So maybe a release note to restart haproxy is what we should aim for to get https://bugzilla.redhat.com/show_bug.cgi?id=1579514 unblocked?

Comment 14 Michele Baldessari 2018-06-07 12:02:42 UTC

(In reply to Harry Rybacki from comment #12)
> Hey Michele, can we expedite this bug? It's blocking another bug[1] which is
> an OSP13 blocker. QE just kicked back our build after failing to verify
> 
> [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1579514

Ok so a short update here: The failure from [1] was a missing barbican container image, although I do presume that the missing of haproxy restart on config change will kick in later as problem as well, but we did not get to test it.

Damien had an idea that is much less complex than our initial plan to fix and I am attaching a review that we tested in the last days. Depending on how testing of [1] go we can then decide if this one is a blocker as well or not.

Comment 22 Michele Baldessari 2018-07-04 15:01:58 UTC

*** Bug 1596942 has been marked as a duplicate of this bug. ***

Comment 24 pkomarov 2018-07-11 11:55:51 UTC

Verified ,

On puddle=2018-07-03.3 and openstack-tripleo-heat-templates-8.0.2-43

Before Minor update:

[root@controller-0 ~]# docker exec -it haproxy-bundle-docker-1 bash
()[root@controller-0 /]# cat /etc/haproxy/haproxy.cfg|grep 1111

...

[root@controller-0 ~]# docker ps |grep haproxy-bundle-docker-1
f53e56e07a61        192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest                       "/bin/bash /usr/lo..."   5 days ago          Up 5 days                                 haproxy-bundle-docker-1

Minor update procedure:

cat > custom_params.yaml <<EOF
parameter_defaults:
  ExtraConfig:
    tripleo::haproxy::haproxy_globals_override:
      'maxconn': 1111
EOF

echo -e /home/stack/custom_params.yaml >> overcloud_deploy.sh

./overcloud_deploy.sh
...


After minor Update :

containers restart verify :
[root@controller-0 ~]# docker ps|grep ha
f66137a34ef5        192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest                       "/bin/bash /usr/lo..."   2 hours ago         Up 2 hours                                 haproxy-bundle-docker-1

new configs in place verification :

[root@controller-0 ~]# docker exec -it haproxy-bundle-docker-1 bash
()[root@controller-0 /]# cat /etc/haproxy/haproxy.cfg|grep 1111
  maxconn  1111

Comment 26 errata-xmlrpc 2018-07-19 14:27:12 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2214

Note You need to log in before you can comment on or make changes to this bug.