Bug 1738857
| Summary: | All the masters and nodes become "NotReady" after restart the kubelet during do the certificates recovery | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | zhou ying <yinzhou> | |
| Component: | Node | Assignee: | Ryan Phillips <rphillips> | |
| Status: | CLOSED ERRATA | QA Contact: | Sunil Choudhary <schoudha> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | high | |||
| Version: | 4.2.0 | CC: | aos-bugs, jokerman, mfojtik, sjenning | |
| Target Milestone: | --- | |||
| Target Release: | 4.2.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1749271 (view as bug list) | Environment: | ||
| Last Closed: | 2019-10-16 06:35:15 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1749271 | |||
|
Description
zhou ying
2019-08-08 09:55:14 UTC
Ryan, getting new kubelet certs is done by running `recover-kubeconfig.sh` script - would you mind taking a look why it didn't?
Also I think you have made the `/etc/kubernetes/ca.crt` managed now (+1), so we probably don't need the hack
# oc get configmap kube-apiserver-to-kubelet-client-ca -n openshift-kube-apiserver-operator --template='{{ index .data "ca-bundle.crt" }}' > /etc/kubernetes/ca.crt
the point is that QA should likely be using an updated steps, not 4.1, if we have those?
I'll take a look at the documentation steps. Looks like the docs need to be updated to use /etc/kubernetes/kubelet-ca.crt. However, I'm still not getting a clean recovery. Still researching... The CSR is requested but not auto-approved on my machine. `oc get csr` lists the pending csr and `oc adm certificate approve [csr-name]` will approve it. My cluster restored after these two tweaks. I talked to Andrea on getting the doc updated for the correct kbuelet-ca.crt. typo, kubelet-ca.crt Recovery script PR to use the internal URI endpoint: https://github.com/openshift/machine-config-operator/pull/1062 @zhou For this BZ a documentation patch has been generated and one MCO PR created to tweak the endpoint to use in `recover-kubeconfig.sh`. Depending on how the kubelet recovery is being tested the CSR may or may not get signed automatically. Step 12 goes over the CSR approval process within the recovery doc. Once the MCO patch merges, then this PR should be able to migrate to Modified. Confirmed with latest payload: 4.2.0-0.nightly-2019-08-15-205330, the issue has fixed. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922 |