Bug 1572874

Summary:	[3.9] atomic-openshift-master-controllers service failed to start after certs redeployment
Product:	OpenShift Container Platform	Reporter:	Gaoyun Pei <gpei>
Component:	Installer	Assignee:	Scott Dodson <sdodson>
Status:	CLOSED DEFERRED	QA Contact:	Gaoyun Pei <gpei>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.9.0	CC:	aos-bugs, jokerman, mmccomas
Target Milestone:	---
Target Release:	3.9.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-11-16 15:14:12 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Gaoyun Pei 2018-04-28 10:24:55 UTC

Description of problem:
Setup an ocp-3.9 cluster, set the server time to 2 years later on each host to make signed ocp certs expired. Run redeploy-certificates.yml to update the expired certs.


After playbook finished, atomic-openshift-master-controllers service failed to be started:

[root@qe-gpei-392-master-etcd-nfs-1 service-catalog]# journalctl  -f -u atomic-openshift-master-controllers.service
-- Logs begin at Fri 2020-05-01 00:00:00 EDT. --
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.346358   34392 controller_utils.go:1026] Caches are synced for scheduler controller
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.346379   34392 leaderelection.go:175] attempting to acquire leader lease  kube-system/kube-scheduler...
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.349528   34392 leaderelection.go:243] lock is held by qe-gpei-392-master-etcd-nfs-1_9e3f4f89-8b61-11ea-80f0-42010af0000f and has not yet expired
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.349542   34392 leaderelection.go:180] failed to acquire lease kube-system/kube-scheduler
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.764841   34392 request.go:1076] body was not decodable (unable to check for Status): Object 'Kind' is missing in 'Error: 'x509: certificate has expired or is not yet valid'
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: Trying to reach: 'https://172.30.186.202:443/apis/servicecatalog.k8s.io/v1beta1''
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: F0501 00:28:22.792148   34392 start_master.go:655] Error starting "openshift.io/cluster-quota-reconciliation" (failed to discover resources: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1: an error on the server ("Error: 'x509: certificate has expired or is not yet valid'\nTrying to reach: 'https://172.30.186.202:443/apis/servicecatalog.k8s.io/v1beta1'") has prevented the request from succeeding)
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.



Since service catalog is installed by default in 3.9, so seems we have to also update the server-catalog certs as well when redeploying the certs.


Version-Release number of the following components:
openshift-ansible-3.9.27-1.git.0.52e35b5.el7.noarch


How reproducible:
100%

Steps to Reproduce:
1.Run playbooks/redeploy-certificates.yml against an ocp-3.9 cluster with all signed ocp certs expired.
ansible-playbook -i host/39 /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml


Actual results:
atomic-openshift-master-controllers service not running


Expected results:
The certs files for service-catalog should also be updated.


Additional info:

Comment 1 Russell Teague 2018-11-16 15:14:12 UTC

There are no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if this bug becomes relevant to an open customer case.