Bug 1382694 - service serving certs are created after attempt to upgrade master packages causing restart of services to fail
Summary: service serving certs are created after attempt to upgrade master packages ca...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.3.1
Assignee: Scott Dodson
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-07 12:39 UTC by Ian Tewksbury
Modified: 2016-10-27 16:13 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously, the upgrade procedure restarted the node service before restarting the master services on hosts that are both masters and nodes. This caused the upgrade to fail because the master services must be updated before the node services in order to ensure new API endpoints and security policies are applied. Now the node service is only restarted when updating the node services which happens after the masters have been upgraded avoiding ensuring upgrades work as expected.
Clone Of:
Environment:
Last Closed: 2016-10-27 16:13:59 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2122 0 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix update 2016-10-27 20:11:30 UTC

Description Ian Tewksbury 2016-10-07 12:39:23 UTC
Description of problem:

When performing the upgrade from OCP 3.2 to OCP 3.3 we ran into an issue where in the service serving certs were not created before the attempt to upgrade the master packages. One of the tasks in upgrading the master packages is to do a service restart but because the service serving certs are now required (https://github.com/openshift/openshift-docs/pull/2324, https://github.com/openshift/openshift-ansible/pull/2358, https://bugzilla.redhat.com/show_bug.cgi?id=1366975) this caused the master services to not be able to restart because they were looking for the certs.


Version-Release number of selected component (if applicable):

3.3

How reproducible:

Only reproduced on the one upgrade. No other attempts to reproduce have been made.


Steps to Reproduce:
The first attempt at upgrade was using the playbooks shipped with the 3.3 RPMs. That version is 3.3.28-1. That version does not contain the Ansible tasks to install the certs if needed. It apeared from that upgrade run that the upgrade worked. But then on a run of the playbooks/byo/config.yml the services attempted to restart and ran into the error.

We then tried using the latest ansible playbooks for 3.3, 3.3.30-1, those scripts have the tasks to install the certs if needed, but not in the correct order. To resolve the issue we had to update the scripts to install the certs before doing the master package upgrade. Patch link below.

1. playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade.yml (3.3.28-1, shipped with 3.3 RPMs)
2. playbooks/byo/update.yml (3.3.28-1, shipped with 3.3 RPMs)
3. run failed due to missing certs
4. download 3.3.30-1 scripts from github
5. playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade.yml (3.3.30-1)
6. Failed for same error because step to install certs is just after the package upgrade which is the step that fails due to restart
7. apply patch (https://github.com/openshift/openshift-ansible/pull/2557)
8. playbooks/byo/openshift-cluster/upgrades/v3_3/upgrade.yml (3.3.30-1 with patch)
9. success
10. playbooks/byo/update.yml (3.3.30-1 with patch)
11. success

Actual results:

Upgrade failed with packages shipped with 3.3 RPMs.

Expected results:

Upgrade should install certs on first run before the master package upgrade to avoid the possibility for this issue.


Additional info:

Suggested patch: https://github.com/openshift/openshift-ansible/pull/2557

Comment 2 Scott Dodson 2016-10-18 19:51:39 UTC
Thanks, this should be fixed by https://github.com/openshift/openshift-ansible/pull/2593 which has been merged.

Comment 5 Anping Li 2016-10-20 04:46:51 UTC
Verified and pass using atomic-openshift-utils-3.3.38-1

Comment 7 errata-xmlrpc 2016-10-27 16:13:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2122


Note You need to log in before you can comment on or make changes to this bug.