Bug 1572874 - [3.9] atomic-openshift-master-controllers service failed to start after certs redeployment
Summary: [3.9] atomic-openshift-master-controllers service failed to start after certs...
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.z
Assignee: Scott Dodson
QA Contact: Gaoyun Pei
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-04-28 10:24 UTC by Gaoyun Pei
Modified: 2018-11-16 15:14 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-11-16 15:14:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Gaoyun Pei 2018-04-28 10:24:55 UTC
Description of problem:
Setup an ocp-3.9 cluster, set the server time to 2 years later on each host to make signed ocp certs expired. Run redeploy-certificates.yml to update the expired certs.


After playbook finished, atomic-openshift-master-controllers service failed to be started:

[root@qe-gpei-392-master-etcd-nfs-1 service-catalog]# journalctl  -f -u atomic-openshift-master-controllers.service
-- Logs begin at Fri 2020-05-01 00:00:00 EDT. --
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.346358   34392 controller_utils.go:1026] Caches are synced for scheduler controller
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.346379   34392 leaderelection.go:175] attempting to acquire leader lease  kube-system/kube-scheduler...
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.349528   34392 leaderelection.go:243] lock is held by qe-gpei-392-master-etcd-nfs-1_9e3f4f89-8b61-11ea-80f0-42010af0000f and has not yet expired
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.349542   34392 leaderelection.go:180] failed to acquire lease kube-system/kube-scheduler
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: I0501 00:28:22.764841   34392 request.go:1076] body was not decodable (unable to check for Status): Object 'Kind' is missing in 'Error: 'x509: certificate has expired or is not yet valid'
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: Trying to reach: 'https://172.30.186.202:443/apis/servicecatalog.k8s.io/v1beta1''
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 atomic-openshift-master-controllers[34392]: F0501 00:28:22.792148   34392 start_master.go:655] Error starting "openshift.io/cluster-quota-reconciliation" (failed to discover resources: unable to retrieve the complete list of server APIs: servicecatalog.k8s.io/v1beta1: an error on the server ("Error: 'x509: certificate has expired or is not yet valid'\nTrying to reach: 'https://172.30.186.202:443/apis/servicecatalog.k8s.io/v1beta1'") has prevented the request from succeeding)
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a
May 01 00:28:22 qe-gpei-392-master-etcd-nfs-1 systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.



Since service catalog is installed by default in 3.9, so seems we have to also update the server-catalog certs as well when redeploying the certs.


Version-Release number of the following components:
openshift-ansible-3.9.27-1.git.0.52e35b5.el7.noarch


How reproducible:
100%

Steps to Reproduce:
1.Run playbooks/redeploy-certificates.yml against an ocp-3.9 cluster with all signed ocp certs expired.
ansible-playbook -i host/39 /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yml


Actual results:
atomic-openshift-master-controllers service not running


Expected results:
The certs files for service-catalog should also be updated.


Additional info:

Comment 1 Russell Teague 2018-11-16 15:14:12 UTC
There are no active cases related to this bug. As such we're closing this bug in order to focus on bugs that are still tied to active customer cases. Please re-open this bug if this bug becomes relevant to an open customer case.


Note You need to log in before you can comment on or make changes to this bug.