Description of problem: The update of the OC does not set additional RBD Cinder back end in cinder.conf. The deployment has ended successfully without any notification, warning or error Version-Release number of selected component (if applicable): puppet-tripleo-8.3.1-0.20180304033907.ed3285e.el7ost.noarch openstack-tripleo-image-elements-8.0.0-0.20180304011935.e427c90.el7ost.noarch openstack-tripleo-validations-8.3.1-0.20180304031640.d5546cd.el7ost.noarch openstack-tripleo-puppet-elements-8.0.0-0.20180304005217.dabb361.el7ost.noarch openstack-tripleo-heat-templates-8.0.0-0.20180304031146.6cd4184.el7ost.noarch openstack-tripleo-common-8.5.1-0.20180304032202.e8d9da9.el7ost.noarch openstack-tripleo-common-containers-8.5.1-0.20180304032202.e8d9da9.el7ost.noarch python-tripleoclient-9.1.1-0.20180305094421.90727db.el7ost.noarch ceph-ansible-3.1.0-0.1.beta3.el7.noarch How reproducible: Steps to Reproduce: 1. Deploy an overcloud with 1 Cinder RBD backend 2. Update the overcloud, with 1 additional Ceph pool and additional RBD back end Actual results: There is only 1 RBD back end set in /etc/cinder/cinder.conf on the cinder-volume docker container Expected results: There are two back ends, one for each pool (as set in the environment file) Additional info:
I looked at Yogev's system, and on the controller I see the hiera data for this feature [1] _has_ been updated. [root@controller-0 hieradata]# hiera -c /etc/puppet/hiera.yaml tripleo::profile::base::cinder::volume::rbd::cinder_rbd_extra_pools ["volumes2"] However, /etc/ceph/ceph.client.openstack.keyring has not be updated, and the puppet-tripleo code that handles the feature [2] doesn't seem to have executed. [1] https://github.com/openstack/tripleo-heat-templates/blob/master/puppet/services/cinder-volume.yaml#L153 [2] https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/cinder/volume/rbd.pp#L76
To reproduce: - deploy OSP13 - add an additional pool in THT - redeploy overcloud so that TripleO should apply the update described in THT; i.e. ceph-ansible should add the additional pool
(In reply to John Fulton from comment #3) > To reproduce: > - deploy OSP13 > - add an additional pool in THT > - redeploy overcloud so that TripleO should apply the update described in > THT; i.e. ceph-ansible should add the additional pool And just for completeness, puppet needs to configure the cinder backend. I mention this so we don't focus on just ceph-ansible. It looks to me like changes to the THT aren't propagating (ceph-ansible and puppet aren't being poked).
I think I reproduced the issue by adding just the following: parameter_defaults: CinderRbdExtraPools: myotherpool on the second 'openstack deploy' attempt. There are two separate issues: 1) ceph-ansible does not refresh the client.openstack keyring, see BZ #1560022 2) the cinder config data in /var/lib/config-data/cinder is refreshed but the cinder-volume container is not restarted; we can use this bug to track the container restart
please attach /var/log/messages from the node running cinder-volume where you saw this issue occur
(In reply to Giulio Fidente from comment #5) > 2) the cinder config data in /var/lib/config-data/cinder is refreshed but > the cinder-volume container is not restarted; we can use this bug to track > the container restart At least in my environment the cinder-volume container mounts files from /var/lib/config-data/puppet-generated/cinder/, are these files also being updated on the second openstack deploy?
it seems the issue here is that there is no mechanism in place to trigger pacemaker restarting of the bundle containers on config change. this could be fixed in different ways, but first we need to decide on the right fix. is it appropriate/necessary to run the puppet code executed by the init bundle containers (such as cinder_volume_init_bundle) on every stack update? If so, and it's indeed the case that re-executing that puppet would cause pacemaker to restart the affected containers on config change, then we could add a new option to paunch such as run_always which would mean to always delete the old container and run the container again. Other possibilities would be to dummy mount /var/lib/config-data into the init bundle containers which would trigger TRIPLEO_CONFIG_HASH handling and force paunch to rerun the container. But, does that fully handle the scenario where perhaps something on the pacemaker config changed, and we'd need to reexecute the puppet code even if /var/lib/config-data is the same?
Hey Michele, can we expedite this bug? It's blocking another bug[1] which is an OSP13 blocker. QE just kicked back our build after failing to verify [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1579514
(Sorry am out this week) The fix for this is quite involved and complicated. So maybe a release note to restart haproxy is what we should aim for to get https://bugzilla.redhat.com/show_bug.cgi?id=1579514 unblocked?
(In reply to Harry Rybacki from comment #12) > Hey Michele, can we expedite this bug? It's blocking another bug[1] which is > an OSP13 blocker. QE just kicked back our build after failing to verify > > [1] - https://bugzilla.redhat.com/show_bug.cgi?id=1579514 Ok so a short update here: The failure from [1] was a missing barbican container image, although I do presume that the missing of haproxy restart on config change will kick in later as problem as well, but we did not get to test it. Damien had an idea that is much less complex than our initial plan to fix and I am attaching a review that we tested in the last days. Depending on how testing of [1] go we can then decide if this one is a blocker as well or not.
*** Bug 1596942 has been marked as a duplicate of this bug. ***
Verified , On puddle=2018-07-03.3 and openstack-tripleo-heat-templates-8.0.2-43 Before Minor update: [root@controller-0 ~]# docker exec -it haproxy-bundle-docker-1 bash ()[root@controller-0 /]# cat /etc/haproxy/haproxy.cfg|grep 1111 ... [root@controller-0 ~]# docker ps |grep haproxy-bundle-docker-1 f53e56e07a61 192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest "/bin/bash /usr/lo..." 5 days ago Up 5 days haproxy-bundle-docker-1 Minor update procedure: cat > custom_params.yaml <<EOF parameter_defaults: ExtraConfig: tripleo::haproxy::haproxy_globals_override: 'maxconn': 1111 EOF echo -e /home/stack/custom_params.yaml >> overcloud_deploy.sh ./overcloud_deploy.sh ... After minor Update : containers restart verify : [root@controller-0 ~]# docker ps|grep ha f66137a34ef5 192.168.24.1:8787/rhosp13/openstack-haproxy:pcmklatest "/bin/bash /usr/lo..." 2 hours ago Up 2 hours haproxy-bundle-docker-1 new configs in place verification : [root@controller-0 ~]# docker exec -it haproxy-bundle-docker-1 bash ()[root@controller-0 /]# cat /etc/haproxy/haproxy.cfg|grep 1111 maxconn 1111
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2214