Puppet is responsible for managing many of the packages on the system, so when doing a yum update on the overcloud we should leave that part to Puppet to ensure that services are e.g. restarted in the correct order. There are upstream patches to implement this: https://review.openstack.org/#/c/194348 (Write package names out to flat files) This allows us to make a list of packages that are Puppet's responsibility. Note that this depends on the changes for bug 1259900. https://review.openstack.org/#/c/190918 (Ensure present/latest for puppet driven package updates) This puts Puppet (instead of yum) in charge of updating packages that it knows about.
We'll likely also need: https://review.openstack.org/#/c/193394/ (wire in tripleo::packages)
Verification here would probably involve looking at the logs on the servers after they've been updated and confirming (1) that none of the packages managed by puppet are updated by the initial call to yum, (2) that puppet runs, and (3) that puppet pulls in any package updates to the packages it is managing. The ultimate test of this is really the CI that upgrades to the latest puddle and verifies with Tempest that everything still works afterwards.
We're considering a different approach to this problem in light of the fact that we deploy with Pacemaker. See https://bugzilla.redhat.com/show_bug.cgi?id=1261921#c15
Need clarifications: 1) Which packages are puppet's responsibility, and how will yum know the difference? 2) Which logs should I be inspecting while the update runs? 3) What should I be interested to find in the logs? 4) What do you mean that we deploy with pacemaker? It's just a part that's in the deployment, right? There is also a problem to test the updates in the latest puddle, since there is nothing that really needs updating. Is there a recommendation about what to do with that?
(In reply to Udi from comment #5) > Need clarifications: > 1) Which packages are puppet's responsibility, and how will yum know the > difference? All of the packages that Puppet knows about (i.e. they have corresponding modules in Puppet). This includes all of the OpenStack services for a start. Puppet writes lists of these packages in the directory /var/lib/tripleo/installed-packages/. Yum ignores them because we pass them with the --exclude flag when we run yum. > 2) Which logs should I be inspecting while the update runs? Yum & Puppet > 3) What should I be interested to find in the logs? "confirming (1) that none of the packages managed by puppet are updated by the initial call to yum, (2) that puppet runs, and (3) that puppet pulls in any package updates to the packages it is managing." > 4) What do you mean that we deploy with pacemaker? It's just a part that's > in the deployment, right? I mean that Pacemaker (and not Puppet or systemd) manages the starting and stopping of services. So on controller nodes, we're ditching the idea of having Puppet update stuff in order to control the restart ordering - instead we'll have yum update everything and stop Pacemaker (and all the services it manages) before and restart it after. It's worth reading all the comments on bug 1261921, and in fact there is probably no point testing this one until the patch for that has landed, since it completely changes the update script. > There is also a problem to test the updates in the latest puddle, since > there is nothing that really needs updating. Is there a recommendation about > what to do with that? Maybe do the same as the CI - install the GA bits and then update to the latest puddle. (Unfortunately this probably involves installing the GA undercloud as well and then updating it first. It would be nice to be able to install just the old overcloud with a new undercloud and then only update the overcloud, but that's probably tricky to get right.)
Verified: Environment: openstack-tripleo-heat-templates-0.8.6-71.el7ost.noarch So this narrows down to checking the compute nodes. Checked a compute node and verified that the packages meant to be excluded during stack update, were actually excluded and were updated later by puppet.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1862
*** Bug 1235705 has been marked as a duplicate of this bug. ***