Description of problem: PROBLEM: Ran the 'ansible-playbook -v /usr/share/ansible/openshift-ansible/playbooks/redeploy-certificates.yaml' and the webconsole, console, and logging components are all complaining of "x509: certificate signed by unknown authority". Version-Release number of selected component (if applicable): OpenShift 3.11 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: ========== >1)The Customer started running the playbook the first time at 10:00CDT and it failed in ~40min. At this time all 3 masters were in a 'NotReady' state. the customer reran the same playbook (2nd run) hoping it would restore them to service, but they did not return to normal. The customer was able to get them back online by bootstrapping them per the same document Custet stated earlier. The customer then restarted the playbook which continued to run. =========== >2) Pods hosted on the 3 masters, logging-fluentd, webconsole, and console, all would not come online with 'x509: certificate signed by unknown authority' errors in the events. This started with the first run of the playbook and continued all the way through the 3rd unsuccessful run of the playbook. The customer was able to resolve this by restarting the ovs-* pod in the openshift-sdn namespace on the masters. After restarting the ovs pods on the masters, the other pods started to come online. ============ >3) The 3rd run of the playbook almost completed but issues arose when towards the end when the openshift-web-console failed to start up. The customer noticed the routers pods were also stuck rolling out with their new certificates as well. 2 new pods had started but none of others were proceeding. At this time I cancelled the rollout of the new router as users started complaining of issues with their applications. At this time, all nodes in the cluster started reporting 'NotReady'. fatal: [1002apfrp00021.optumfe.com]: FAILED! => {"attempts": 60, "changed": false, "module_results": {"cmd": "/usr/bin/oc get deployment webconsole -o json -n openshift-web-console", "results": "observedGeneration": 3, "replicas": 3, "unavailableReplicas": 3, "updatedReplicas": 3}}] Expected results: For the playbook to run successfully and the certificates to be updated. Additional info:
Moving to the ansible team, I do not know what the playbook actually does.
To be reviewed as part of https://issues.redhat.com/browse/CORS-1470
Jira issue https://issues.redhat.com/browse/CORS-1470 was not scheduled for the current sprint.
Verify this bug with openshift-ansible-3.11.299-1.git.0.2dfaf92.el7.noarch.rpm. 1. Redeploy openshift CA ansible-playbook openshift-ansible/playbooks/openshift-master/redeploy-openshift-ca.yml -v 2. Redeploy openshift certificates ansible-playbook openshift-ansible/playbooks/redeploy-certificates.yml -v 09-29 22:11:48 TASK [Check servingInfo.clientCA = ca.crt in master config] ******************** 09-29 22:11:48 fatal: [ec2-52-90-69-73.compute-1.amazonaws.com]: FAILED! => {"changed": false, "msg": "Detected an incomplete OpenShift CA redeployment. Please set openshift_redeploy_openshift_ca=true in the inventory and re-run redeploy-certifcates.yml\n"}
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.306 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4170