+++ This bug was initially created as a clone of Bug #2010665 +++ Single node iBIP flow: Cluster bootstrap will try to send event after tear down of temporary control plane. The issue happened 5 /7 I the son live-iso CI. https://sippy.ci.openshift.org/sippy-ng/jobs/4.9/runs?filters=%7B%22items%22%3A%5B%7B%22columnField%22%3A%22name%22%2C%22operatorValue%22%3A%22equals%22%2C%22value%22%3A%22periodic-ci-openshift-release-master-nightly-4.9-e2e-metal-single-node-live-iso%22%7D%5D%7D&sortField=timestamp&sort=desc The issue started with 4.9.0-rc.5 (rc.4 works without issues) See the race here: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-assisted-test-infra-master-e2e-metal-single-node-live-iso-periodic/1444452225400180736 from the logs we can see that kube-api got shutdown request and that caused cluster bootstrap to fail on sending event kube-api: I1003 00:22:20.076822 1 genericapiserver.go:421] "[graceful-termination] shutdown event" name="InFlightRequestsDrained" I1003 00:22:20.076840 1 genericapiserver.go:751] Event(v1.ObjectReference{Kind:"Namespace", Namespace:"default", Name:"openshift-kube-apiserver", UID:"", APIVersion:"v1", ResourceVersion:"", FieldPath:""}): type: 'Normal' reason: 'InFlightRequestsDrained' All non long-running request(s) in-flight have drained I1003 00:22:20.077676 1 dynamic_serving_content.go:144] "Shutting down controller" name="aggregator-proxy-cert::/etc/kubernetes/secrets/apiserver-proxy.crt::/etc/kubernetes/secrets/apiserver-proxy.key" bootkube: Oct 03 00:22:20 test-infra-cluster-master-0 bootkube.sh[2601]: Tearing down temporary bootstrap control plane... Oct 03 00:22:20 test-infra-cluster-master-0 bootkube.sh[2601]: Sending bootstrap-finished event.The connection to the server api-int.test-infra-cluster.redhat.com:6443 was refused - did you specify the right host or port? Oct 03 00:22:20 test-infra-cluster-master-0 systemd[1]: bootkube.service: Main process exited, code=exited, status=1/FAILURE Oct 03 00:22:20 test-infra-cluster-master-0 systemd[1]: bootkube.service: Failed with result 'exit-code'. Oct 03 00:22:25 test-infra-cluster-master-0 systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart. Oct 03 00:22:25 test-infra-cluster-master-0 systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 1. --- Additional comment from Marius Cornea on 2021-10-06 12:51:40 UTC --- The same issue is reproducing for me with the zero touch provisioning flow and 4.9.0-0.nightly-2021-10-05-004711 --- Additional comment from Marius Cornea on 2021-10-06 14:07:56 UTC --- --- Additional comment from Ryan Phillips on 2021-10-06 15:29:48 UTC --- There is a new rhcos image being built for 4.9 that fixes pod status reporting... so this may clear up... https://bugzilla.redhat.com/show_bug.cgi?id=2011050 --- Additional comment from Eran Cohen on 2021-10-06 15:33:05 UTC --- Setting as a blocker since this issue fails the single node live-iso CI. --- Additional comment from Eran Cohen on 2021-10-06 16:09:54 UTC --- The actual issue is: https://github.com/openshift/installer/blob/6617bc2e334654bb6e85976f049e51fc1c01aa3f/data/data/bootstrap/files/usr/local/bin/bootkube.sh.template#L427 Bootkube tries to use oc after cluster bootstrap is done and there is no API This explains why all the bootkube retries failed with this error: Oct 06 15:51:53 test-infra-cluster-master-0 bootkube.sh[243426]: Starting cluster-bootstrap... Oct 06 15:51:53 test-infra-cluster-master-0 bootkube.sh[243426]: The connection to the server api-int.test-infra-cluster.redhat.com:6443 was refused - did you specify the right host or port? It's not really starting cluster-bootstrap, cluster-bootstrap is done (the log is misleading) and the command that fails is the oc patch --- Additional comment from Omri Hochman on 2021-10-06 17:56:10 UTC ---
*** This bug has been marked as a duplicate of bug 2011701 ***