Bug 1771410
Summary: | [UPI]Failed to recovery from expired certificates with all nodes "NotReady" | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | zhou ying <yinzhou> | |
Component: | kube-apiserver | Assignee: | Tomáš Nožička <tnozicka> | |
Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> | |
Severity: | medium | Docs Contact: | ||
Priority: | medium | |||
Version: | 4.3.0 | CC: | aos-bugs, deads, eparis, jokerman, mfojtik, mfuruta, rphillips, sttts, tnozicka | |
Target Milestone: | --- | |||
Target Release: | 4.4.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | No Doc Update | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1805398 1810008 (view as bug list) | Environment: | ||
Last Closed: | 2020-05-04 11:15:07 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1810008 | |||
Bug Blocks: |
Description
zhou ying
2019-11-12 10:12:08 UTC
> Version-Release number of selected component (if applicable): > 4.3.0-0.nightly-2019-11-11-182924 > https://docs.openshift.com/container-platform/4.1/backup_and_restore/disaster_recovery/scenario-3-expired-certs.html These don't go together. Kubelet procedure has changed over time. Use https://docs.openshift.com/container-platform/4.2/backup_and_restore/disaster_recovery/scenario-3-expired-certs.html or latest version of those docs this one looks weird though: Nov 12 09:27:03 yinzho-tjwqc-m-2.c.openshift-qe.internal hyperkube[11728]: E1112 09:27:03.011584 11728 reflector.go:123] k8s.io/kubernetes/pkg/kubelet/kubelet.go:459: Failed to list *v1.Node: Get https://api-int.yinzhou.qe.gcp.devcluster.openshift.com:6443/api/v1/nodes?fieldSelector=metadata.name%3Dyinzho-tjwqc-m-2.c.openshift-qe.internal&limit=500&resourceVersion=0: x509: certificate has expired or is not yet valid If you hit this, pls use openssl to dump the cert at that url and attach it to the BZ (redact modulus and other private fields), also check time on those nodes and your machine to make sure it is synced, record the time into the BZ. (like openssl s_client -connect api-int.yinzhou.qe.gcp.devcluster.openshift.com:6443 | openssl x509 -noout -text) Does running the recovery second time help? After run the recovery second time, the two master still notready: [root@yinzho-2bsks-m-0 ~]# oc get node NAME STATUS ROLES AGE VERSION yinzho-2bsks-m-0.c.openshift-qe.internal Ready master 5h7m v1.16.2 yinzho-2bsks-m-1.c.openshift-qe.internal NotReady master 5h7m v1.16.2 yinzho-2bsks-m-2.c.openshift-qe.internal NotReady master 5h7m v1.16.2 yinzho-2bsks-w-a-vhglm.c.openshift-qe.internal Ready worker 4h55m v1.16.2 yinzho-2bsks-w-b-b7rdw.c.openshift-qe.internal Ready worker 4h58m v1.16.2 yinzho-2bsks-w-c-wqrjw.c.openshift-qe.internal Ready worker 4h58m v1.16.2 Bug 1797897 is not seen now but hit another issue: bug 1802944 Hit bug 1797897 again, reported new bug 1806930. I am purging the BZ deps or we can't merge fixes https://github.com/openshift/origin/pull/24630#issuecomment-594486876 removing https://bugzilla.redhat.com/show_bug.cgi?id=1811062 dependency as BZ merge bot can't get over a second dependency to the same release as this BZ and not 4.5 as the first one https://github.com/openshift/cluster-kube-scheduler-operator/pull/217#issuecomment-597568806 The dependency was added so we merge only after the origin change lands in 4.4 and it is merged now. Verified in 4.4.0-0.nightly-2020-03-11-212258 using steps in bug 1810008#c4 . (BTW hit issue already tracked in bug 1812593) Also hit another issue no time to analyze today. Will check next day. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |