1350489 – Upgrade fails because os-collect-config is restarted during yum update

Bug 1350489 - Upgrade fails because os-collect-config is restarted during yum update

Summary: Upgrade fails because os-collect-config is restarted during yum update

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	os-collect-config
Sub Component:
Version:	9.0 (Mitaka)
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	async
Target Release:	9.0 (Mitaka)
Assignee:	Jason Joyce
QA Contact:	Arik Chernetsky
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1333977 1377256
TreeView+	depends on / blocked

Reported:	2016-06-27 15:02 UTC by Jiri Stransky
Modified:	2019-12-16 06:03 UTC (History)
CC List:	20 users (show)
Fixed In Version:	os-collect-config-0.1.37-6.el7ost
Doc Type:	Bug Fix
Doc Text:	The "os-collect-config" service on the Overcloud restarted on an RPM update. This caused Overcloud updates to fail. This fix changes the behavior so that "os-collect-config" does not restart on an RPM update. The Overcloud updates now succeed after an update of "os-collect-config". Note that "os-collect-config" gracefully restarts itself when "os-refresh-config" runs, so the restart on update is not required.
Clone Of:
Clones:	1377256 (view as bug list)
Environment:
Last Closed:	2016-09-16 12:59:21 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1603144	None	None	None	2016-07-14 16:56:16 UTC
RDO	1678	None	None	None	2016-07-15 21:03:05 UTC
Red Hat Product Errata	RHEA-2016:1599	normal	SHIPPED_LIVE	Red Hat OpenStack Platform 9 director Release Candidate Advisory	2016-08-11 15:25:37 UTC

Description Jiri Stransky 2016-06-27 15:02:36 UTC

Description of problem:

Step1 of the controller upgrade fails when upgrading OSP 8 to OSP 9 because os-collect-config is restarted during yum update, which means it doesn't finish the rest of the upgrade step 1 script, and never reports the success of step 1 to Heat. Heat waits and times out.

Looking at dist git, this looks like a packaging regression from fixed bug 1272254, it has most probably the same cause, and hopefully also the same fix will apply.

Reference links:

https://bugzilla.redhat.com/show_bug.cgi?id=1272254#c2
https://review.gerrithub.io/#/c/249945/1

Version-Release number of selected component (if applicable):

original: os-collect-config-0.1.37-2.el7ost.noarch
updated:  os-collect-config-0.1.37-4.el7ost.noarch

Comment 2 Jiri Stransky 2016-06-29 12:53:28 UTC

Hmm so it seems the OSP 8 postun script has the same issue, and it's actually *that* postun script which gets executed during upgrade to OSP 9.

So this fix is still important because it will fix minor updates of OSP 9 and major upgrade from OSP 9 to OSP 10, but it doesn't fix the upgrade from OSP 8 to OSP 9.

We'll probably need to document and out-of-band (= out-of-heat) `sudo yum -y update os-collect-config` to be run on all nodes after the `overcloud deploy` that does the repository switch, but before the `overcloud deploy` that upgrades controllers.

Comment 3 Mike Burns 2016-06-29 17:11:34 UTC

(In reply to Jiri Stransky from comment #2)
> Hmm so it seems the OSP 8 postun script has the same issue, and it's
> actually *that* postun script which gets executed during upgrade to OSP 9.
> 
> So this fix is still important because it will fix minor updates of OSP 9
> and major upgrade from OSP 9 to OSP 10, but it doesn't fix the upgrade from
> OSP 8 to OSP 9.
> 
> We'll probably need to document and out-of-band (= out-of-heat) `sudo yum -y
> update os-collect-config` to be run on all nodes after the `overcloud
> deploy` that does the repository switch, but before the `overcloud deploy`
> that upgrades controllers.

We already document that we have to update to the latest osp 8 before upgrade to osp9, right?  Should we just fix this in 8 and push it out quickly?

Comment 5 Jiri Stransky 2016-06-30 08:23:24 UTC

(In reply to Mike Burns from comment #3)
> We already document that we have to update to the latest osp 8 before
> upgrade to osp9, right?  Should we just fix this in 8 and push it out
> quickly?

I'm not sure if such minor update to OSP 8 will work though. It will probably fail with the same problem -- the minor update script will be triggered by Heat via os-collect-config, it will run yum update, which will restart os-collect-config in an inconvenient time and the minor update will not finish and will not be reported to Heat, leaving the stack-update stuck. Not 100% sure here but i don't think it would work.

Comment 7 Marios Andreou 2016-07-15 09:30:51 UTC

the real fix, as discussed with sbaker (some context below), is in the packaging, in particular we need to add SendSIGKILL=no to the systemd unit file.  Moving back to ON_DEV since we definitley need the packaging to happen for 8..9 upgrades

---------------------context from sbaker:--------------------


"We had os-collect-config upgrades working in OSP 7, yes the postrun
restarts the service, but the systemd unit file also has this:

     KillMode=process
     SendSIGKILL=no

This means that when os-collect-config gets restarted, the current
running os-refresh-config will continue to completion. Its not ideal
because os-refresh-config output stops getting logged to the journal,
and you only find out what happened in the rest of the run if/when the
various heat deployment resources get signaled. 
...

> > This reminds me, when upgrading to the package which contains the SendSIGKILL=no fix my testing showed that when the postrun restart happens it uses the *new* unit file behavior. This means that if the SendSIGKILL=no fix is released into OSP 8 and 9 no special pre-upgrade handling is needed.
>

>
> So you're telling me we fix this whole thing by simply shipping an os-collect-config RPM with the right stuff in the systemd unit file and everything just works?
>
> ... facepalm...
>
I remember now, thats how we fixed 7.3 upgrade.

Comment 8 Steve Baker 2016-07-15 21:04:19 UTC

RDO fix posted, I'm on PTO next week so someone can take ownership of that change if anything needs fixing.

Comment 10 mlammon 2016-08-05 13:29:23 UTC

8.0 GA - > 9 Upgrade
I followed the latest upgrade guide and finished on 02 AUG 16
(http://etherpad.corp.redhat.com/ospd9-upgrade)

Initial deployment:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1   --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server clock.redhat.com --timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e network-environment.yaml --ceph-storage-scale 1

[root@overcloud-controller-1 ~]# rpm -qa | grep os-collect-config
os-collect-config-0.1.37-6.el7ost.noarch
[root@overcloud-controller-1 ~]# date
Fri Aug  5 13:29:02 UTC 2016

I did not see this issue reported during this upgrade.

Comment 12 errata-xmlrpc 2016-08-11 11:33:43 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-1599.html

Note You need to log in before you can comment on or make changes to this bug.

apevec
augol
dmacpher
jjoyce
jliberma
jstransk
lhh
mandreou
mburns
mlammon
morazi
nkrishna
nlevinki
ohochman
rhel-osp-director-maint
rlopez
sbaker
sclewis
srevivo
tvignaud