Bug 1743661
Summary: | Fail to bootstrap an UPI BM cluster with OCP 4.2 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Denis Ollier <dollierp> |
Component: | kube-apiserver | Assignee: | Stefan Schimanski <sttts> |
Status: | CLOSED NOTABUG | QA Contact: | Xingxing Xia <xxia> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.2.0 | CC: | aos-bugs, dmoessne, mfojtik, sjenning, wking |
Target Milestone: | --- | ||
Target Release: | 4.3.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-31 10:14:58 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Denis Ollier
2019-08-20 12:10:42 UTC
> Aug 20 10:33:58 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com openshift.sh[2867]: error: unable to recognize "./99_kubeadmin-password-secret.yaml": Get https://localhost:6443/api?timeout=32s: x509: certificate is valid for api.bm1.oc4, not localhost This is a red-herring, and I've spun off bug 1743840 about quieting it down. I haven't dug into the actual failure cause here yet. Please provide the log bundle from ``` openshift-install gather bootstrap --bootstrap <bootstrap-host-ip> --master <master-0-host-ip> [--master <master-N-host-ip>] ``` From the initial look from `cat <attachment>/journal-bootstrap.log | rg 'bootkube'` ``` Aug 20 10:49:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com bootkube.sh[3439]: E0820 10:49:59.088980 1 reflector.go:134] github.com/openshift/cluster-bootstrap/pkg/start/status.go:66: Failed to list *v1.Pod: Get https://localhost:6443/api/v1/pods: dial tcp [::1]:6443: connect: connection refused Aug 20 10:49:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com bootkube.sh[3439]: [#1175] failed to fetch discovery: Get https://localhost:6443/api?timeout=32s: dial tcp [::1]:6443: connect: connection refused Aug 20 10:49:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com bootkube.sh[3439]: [#1176] failed to fetch discovery: Get https://localhost:6443/api?timeout=32s: dial tcp [::1]:6443: connect: connection refused Aug 20 10:49:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com bootkube.sh[3439]: [#1177] failed to fetch discovery: Get https://localhost:6443/api?timeout=32s: dial tcp [::1]:6443: connect: connection refused Aug 20 10:49:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com bootkube.sh[3439]: [#1178] failed to fetch discovery: Get https://localhost:6443/api?timeout=32s: dial tcp [::1]:6443: connect: connection refused Aug 20 10:49:59 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com bootkube.sh[3439]: [#1179] failed to fetch discovery: Get https://localhost:6443/api?timeout=32s: dial tcp [::1]:6443: connect: connection refused ``` the bootstrap-kube-apiserver is failing to start.. so moving the component. Seth, I don't see the reason for: Aug 22 08:38:53 cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com hyperkube[6863]: E0822 08:38:53.045487 6863 pod_workers.go:190] Error syncing pod 60d454f702c957e050f32e835f08f8f3 ("bootstrap-kube-apiserver-cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com_kube-system(60d454f702c957e050f32e835f08f8f3)"), skipping: failed to "StartContainer" for "kube-apiserver" with CrashLoopBackOff: "Back-off 2m40s restarting failed container=kube-apiserver pod=bootstrap-kube-apiserver-cnv-qe-08.cnvqe.lab.eng.rdu2.redhat.com_kube-system(60d454f702c957e050f32e835f08f8f3)" None of the static pods come up. I don't see any relevant container logs, so we can't do much. Hi, I retried with newer versions: - OCP 4.2.0-0.nightly-2019-08-28-035628 - RHCOS 42.80.20190828.0 The bootstrap node was properly setup and I managed to have a working cluster. > oc get clusterversion > > NAME VERSION AVAILABLE PROGRESSING SINCE STATUS > version 4.2.0-0.nightly-2019-08-28-035628 True False 54s Cluster version is 4.2.0-0.nightly-2019-08-28-035628 Some nodes often register as "localhost" although they get a proper hostname from DNS but it's probably another issue. > oc get nodes > > NAME STATUS ROLES AGE VERSION > cnv-qe-07.cnvqe.lab.eng.rdu2.redhat.com Ready master 27m v1.14.0+b985ea310 > localhost Ready worker 18m v1.14.0+b985ea310 Closing this issue. (I will probably open a new one for the localhost issue after more investigations). Thanks. |