Bug 1452367

Summary:	Redeploy CA will try to restart services when certs are expired, causing failure.
Product:	OpenShift Container Platform	Reporter:	Ryan Howe <rhowe>
Component:	Installer	Assignee:	Andrew Butcher <abutcher>
Status:	CLOSED ERRATA	QA Contact:	Gaoyun Pei <gpei>
Severity:	urgent	Docs Contact:
Priority:	high
Version:	3.5.0	CC:	aos-bugs, jokerman, mmccomas, smunilla
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	The OpenShift CA redeployment playbook (playbooks/byo/openshift-cluster/redeploy-openshift-ca.yml) would fail to restart services if certificates were previously expired. Service restarts are now skipped within the OpenShift CA redeployment playbook when expired certificates are detected. Expired cluster certificates may be replaced with the certificate redeployment playbook (playbooks/byo/openshift-cluster/redeploy-certificates.yml) once the OpenShift CA certificate has been replaced via the OpenShift CA redeployment playbook.	Story Points:	---
Clone Of:
Clones:	1460969 1460970 1460971 1460972 (view as bug list)		Environment:
Last Closed:	2017-08-10 05:24:06 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1460969, 1460970, 1460971, 1460972

Description Ryan Howe 2017-05-18 18:59:42 UTC

Description of problem:

https://docs.openshift.com/container-platform/3.5/install_config/redeploying_certificates.html#redeploying-new-custom-ca

$ ansible-playbook -i <inventory_file> \
    /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-openshift-ca.yml

If the server certs are already expired when we go to restart services the services will fail. 

How reproducible:
100%

Steps to Reproduce:
1. Certs expire and ca expires 
2. Redeploy openshift and etcd ca's 

Actual results:
Fails restarting etcd due to invalid certs 

Expected results:
The redeploy to skip the restart service and jump to creating server certs for the expired certs.

Comment 1 Andrew Butcher 2017-06-02 19:17:56 UTC

Proposed fix https://github.com/openshift/openshift-ansible/pull/4360

Note that I've tested by running a script on a combination etcd/master/node instance which drops an expired certificate for each service. https://gist.github.com/abutcher/b5cfa5451c790185d3a34ca1bc1a820f#file-gen_expired_tls-sh

Comment 3 Gaoyun Pei 2017-06-13 09:23:24 UTC

Verify this bug with openshift-ansible-3.6.98-1.git.0.e651d65.el7.noarch.

When openshift certs and CA cert expired, redeploy CA certs firstly:
ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-openshift-ca.yml
Redeploy CA playbook will skip restart etcd/master/node service since expired cert detected. 

Redeploy openshift certs next:
ansible-playbook -i host /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/redeploy-certificates.yml
This playbook will generate new certs and restart etcd/master/node service.

Then all the certs were replaced by new certs, ocp env works well again.


This issue should also exist in ocp-3.2/3.3/3.4/3.5 installer, will clone it to 3.2-3.5 to make sure the fix backport to previous installer.

Comment 5 errata-xmlrpc 2017-08-10 05:24:06 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716