Bug 1350489 - Upgrade fails because os-collect-config is restarted during yum update
Summary: Upgrade fails because os-collect-config is restarted during yum update
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: os-collect-config
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: async
: 9.0 (Mitaka)
Assignee: Jason Joyce
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks: 1333977 1377256
TreeView+ depends on / blocked
 
Reported: 2016-06-27 15:02 UTC by Jiri Stransky
Modified: 2019-12-16 06:03 UTC (History)
20 users (show)

Fixed In Version: os-collect-config-0.1.37-6.el7ost
Doc Type: Bug Fix
Doc Text:
The "os-collect-config" service on the Overcloud restarted on an RPM update. This caused Overcloud updates to fail. This fix changes the behavior so that "os-collect-config" does not restart on an RPM update. The Overcloud updates now succeed after an update of "os-collect-config". Note that "os-collect-config" gracefully restarts itself when "os-refresh-config" runs, so the restart on update is not required.
Clone Of:
: 1377256 (view as bug list)
Environment:
Last Closed: 2016-09-16 12:59:21 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1603144 0 None None None 2016-07-14 16:56:16 UTC
RDO 1678 0 None None None 2016-07-15 21:03:05 UTC
Red Hat Product Errata RHEA-2016:1599 0 normal SHIPPED_LIVE Red Hat OpenStack Platform 9 director Release Candidate Advisory 2016-08-11 15:25:37 UTC

Description Jiri Stransky 2016-06-27 15:02:36 UTC
Description of problem:

Step1 of the controller upgrade fails when upgrading OSP 8 to OSP 9 because os-collect-config is restarted during yum update, which means it doesn't finish the rest of the upgrade step 1 script, and never reports the success of step 1 to Heat. Heat waits and times out.

Looking at dist git, this looks like a packaging regression from fixed bug 1272254, it has most probably the same cause, and hopefully also the same fix will apply.

Reference links:

https://bugzilla.redhat.com/show_bug.cgi?id=1272254#c2
https://review.gerrithub.io/#/c/249945/1

Version-Release number of selected component (if applicable):

original: os-collect-config-0.1.37-2.el7ost.noarch
updated:  os-collect-config-0.1.37-4.el7ost.noarch

Comment 2 Jiri Stransky 2016-06-29 12:53:28 UTC
Hmm so it seems the OSP 8 postun script has the same issue, and it's actually *that* postun script which gets executed during upgrade to OSP 9.

So this fix is still important because it will fix minor updates of OSP 9 and major upgrade from OSP 9 to OSP 10, but it doesn't fix the upgrade from OSP 8 to OSP 9.

We'll probably need to document and out-of-band (= out-of-heat) `sudo yum -y update os-collect-config` to be run on all nodes after the `overcloud deploy` that does the repository switch, but before the `overcloud deploy` that upgrades controllers.

Comment 3 Mike Burns 2016-06-29 17:11:34 UTC
(In reply to Jiri Stransky from comment #2)
> Hmm so it seems the OSP 8 postun script has the same issue, and it's
> actually *that* postun script which gets executed during upgrade to OSP 9.
> 
> So this fix is still important because it will fix minor updates of OSP 9
> and major upgrade from OSP 9 to OSP 10, but it doesn't fix the upgrade from
> OSP 8 to OSP 9.
> 
> We'll probably need to document and out-of-band (= out-of-heat) `sudo yum -y
> update os-collect-config` to be run on all nodes after the `overcloud
> deploy` that does the repository switch, but before the `overcloud deploy`
> that upgrades controllers.

We already document that we have to update to the latest osp 8 before upgrade to osp9, right?  Should we just fix this in 8 and push it out quickly?

Comment 5 Jiri Stransky 2016-06-30 08:23:24 UTC
(In reply to Mike Burns from comment #3)
> We already document that we have to update to the latest osp 8 before
> upgrade to osp9, right?  Should we just fix this in 8 and push it out
> quickly?

I'm not sure if such minor update to OSP 8 will work though. It will probably fail with the same problem -- the minor update script will be triggered by Heat via os-collect-config, it will run yum update, which will restart os-collect-config in an inconvenient time and the minor update will not finish and will not be reported to Heat, leaving the stack-update stuck. Not 100% sure here but i don't think it would work.

Comment 7 Marios Andreou 2016-07-15 09:30:51 UTC
the real fix, as discussed with sbaker (some context below), is in the packaging, in particular we need to add SendSIGKILL=no to the systemd unit file.  Moving back to ON_DEV since we definitley need the packaging to happen for 8..9 upgrades

---------------------context from sbaker:--------------------


"We had os-collect-config upgrades working in OSP 7, yes the postrun
restarts the service, but the systemd unit file also has this:

     KillMode=process
     SendSIGKILL=no

This means that when os-collect-config gets restarted, the current
running os-refresh-config will continue to completion. Its not ideal
because os-refresh-config output stops getting logged to the journal,
and you only find out what happened in the rest of the run if/when the
various heat deployment resources get signaled. 
...

> > This reminds me, when upgrading to the package which contains the SendSIGKILL=no fix my testing showed that when the postrun restart happens it uses the *new* unit file behavior. This means that if the SendSIGKILL=no fix is released into OSP 8 and 9 no special pre-upgrade handling is needed.
>

>
> So you're telling me we fix this whole thing by simply shipping an os-collect-config RPM with the right stuff in the systemd unit file and everything just works?
>
> ... facepalm...
>
I remember now, thats how we fixed 7.3 upgrade.

Comment 8 Steve Baker 2016-07-15 21:04:19 UTC
RDO fix posted, I'm on PTO next week so someone can take ownership of that change if anything needs fixing.

Comment 10 mlammon 2016-08-05 13:29:23 UTC
8.0 GA - > 9 Upgrade
I followed the latest upgrade guide and finished on 02 AUG 16
(http://etherpad.corp.redhat.com/ospd9-upgrade)

Initial deployment:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1   --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml --ceph-storage-scale 1

[root@overcloud-controller-1 ~]# rpm -qa | grep os-collect-config
os-collect-config-0.1.37-6.el7ost.noarch
[root@overcloud-controller-1 ~]# date
Fri Aug  5 13:29:02 UTC 2016

I did not see this issue reported during this upgrade.

Comment 12 errata-xmlrpc 2016-08-11 11:33:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1599.html


Note You need to log in before you can comment on or make changes to this bug.