|Summary:||Upgrade fails because os-collect-config is restarted during yum update|
|Product:||Red Hat OpenStack||Reporter:||Jiri Stransky <jstransk>|
|Component:||os-collect-config||Assignee:||Jason Joyce <jjoyce>|
|Status:||CLOSED ERRATA||QA Contact:||Arik Chernetsky <achernet>|
|Version:||9.0 (Mitaka)||CC:||apevec, augol, dmacpher, jjoyce, jliberma, jstransk, lhh, mandreou, mburns, mlammon, morazi, nkrishna, nlevinki, ohochman, rhel-osp-director-maint, rlopez, sbaker, sclewis, srevivo, tvignaud|
|Target Milestone:||async||Keywords:||Reopened, Triaged|
|Target Release:||9.0 (Mitaka)|
|Fixed In Version:||os-collect-config-0.1.37-6.el7ost||Doc Type:||Bug Fix|
The "os-collect-config" service on the Overcloud restarted on an RPM update. This caused Overcloud updates to fail. This fix changes the behavior so that "os-collect-config" does not restart on an RPM update. The Overcloud updates now succeed after an update of "os-collect-config". Note that "os-collect-config" gracefully restarts itself when "os-refresh-config" runs, so the restart on update is not required.
|:||1377256 (view as bug list)||Environment:|
|Last Closed:||2016-09-16 12:59:21 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:|
|Bug Blocks:||1333977, 1377256|
Description Jiri Stransky 2016-06-27 15:02:36 UTC
Description of problem: Step1 of the controller upgrade fails when upgrading OSP 8 to OSP 9 because os-collect-config is restarted during yum update, which means it doesn't finish the rest of the upgrade step 1 script, and never reports the success of step 1 to Heat. Heat waits and times out. Looking at dist git, this looks like a packaging regression from fixed bug 1272254, it has most probably the same cause, and hopefully also the same fix will apply. Reference links: https://bugzilla.redhat.com/show_bug.cgi?id=1272254#c2 https://review.gerrithub.io/#/c/249945/1 Version-Release number of selected component (if applicable): original: os-collect-config-0.1.37-2.el7ost.noarch updated: os-collect-config-0.1.37-4.el7ost.noarch
Comment 2 Jiri Stransky 2016-06-29 12:53:28 UTC
Hmm so it seems the OSP 8 postun script has the same issue, and it's actually *that* postun script which gets executed during upgrade to OSP 9. So this fix is still important because it will fix minor updates of OSP 9 and major upgrade from OSP 9 to OSP 10, but it doesn't fix the upgrade from OSP 8 to OSP 9. We'll probably need to document and out-of-band (= out-of-heat) `sudo yum -y update os-collect-config` to be run on all nodes after the `overcloud deploy` that does the repository switch, but before the `overcloud deploy` that upgrades controllers.
Comment 3 Mike Burns 2016-06-29 17:11:34 UTC
(In reply to Jiri Stransky from comment #2) > Hmm so it seems the OSP 8 postun script has the same issue, and it's > actually *that* postun script which gets executed during upgrade to OSP 9. > > So this fix is still important because it will fix minor updates of OSP 9 > and major upgrade from OSP 9 to OSP 10, but it doesn't fix the upgrade from > OSP 8 to OSP 9. > > We'll probably need to document and out-of-band (= out-of-heat) `sudo yum -y > update os-collect-config` to be run on all nodes after the `overcloud > deploy` that does the repository switch, but before the `overcloud deploy` > that upgrades controllers. We already document that we have to update to the latest osp 8 before upgrade to osp9, right? Should we just fix this in 8 and push it out quickly?
Comment 5 Jiri Stransky 2016-06-30 08:23:24 UTC
(In reply to Mike Burns from comment #3) > We already document that we have to update to the latest osp 8 before > upgrade to osp9, right? Should we just fix this in 8 and push it out > quickly? I'm not sure if such minor update to OSP 8 will work though. It will probably fail with the same problem -- the minor update script will be triggered by Heat via os-collect-config, it will run yum update, which will restart os-collect-config in an inconvenient time and the minor update will not finish and will not be reported to Heat, leaving the stack-update stuck. Not 100% sure here but i don't think it would work.
Comment 7 Marios Andreou 2016-07-15 09:30:51 UTC
the real fix, as discussed with sbaker (some context below), is in the packaging, in particular we need to add SendSIGKILL=no to the systemd unit file. Moving back to ON_DEV since we definitley need the packaging to happen for 8..9 upgrades ---------------------context from sbaker:-------------------- "We had os-collect-config upgrades working in OSP 7, yes the postrun restarts the service, but the systemd unit file also has this: KillMode=process SendSIGKILL=no This means that when os-collect-config gets restarted, the current running os-refresh-config will continue to completion. Its not ideal because os-refresh-config output stops getting logged to the journal, and you only find out what happened in the rest of the run if/when the various heat deployment resources get signaled. ... > > This reminds me, when upgrading to the package which contains the SendSIGKILL=no fix my testing showed that when the postrun restart happens it uses the *new* unit file behavior. This means that if the SendSIGKILL=no fix is released into OSP 8 and 9 no special pre-upgrade handling is needed. > > > So you're telling me we fix this whole thing by simply shipping an os-collect-config RPM with the right stuff in the systemd unit file and everything just works? > > ... facepalm... > I remember now, thats how we fixed 7.3 upgrade.
Comment 8 Steve Baker 2016-07-15 21:04:19 UTC
RDO fix posted, I'm on PTO next week so someone can take ownership of that change if anything needs fixing.
Comment 10 mlammon 2016-08-05 13:29:23 UTC
8.0 GA - > 9 Upgrade I followed the latest upgrade guide and finished on 02 AUG 16 (http://etherpad.corp.redhat.com/ospd9-upgrade) Initial deployment: openstack overcloud deploy --templates --control-scale 3 --compute-scale 1 --neutron-network-type vxlan --neutron-tunnel-types vxlan --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml --ceph-storage-scale 1 [root@overcloud-controller-1 ~]# rpm -qa | grep os-collect-config os-collect-config-0.1.37-6.el7ost.noarch [root@overcloud-controller-1 ~]# date Fri Aug 5 13:29:02 UTC 2016 I did not see this issue reported during this upgrade.
Comment 12 errata-xmlrpc 2016-08-11 11:33:43 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1599.html