Description: Install cluster on upi-on-vsphere behind https proxy with 4.7.0-0.nightly-2021-01-25-175331, and bootstrap complete is failed: + '/home/slave7/workspace/Launch Environment Flexy/workdir/openshift-install' wait-for bootstrap-complete --dir '/home/slave7/workspace/Launch Environment Flexy/workdir/install-dir' level=info msg=Waiting up to 20m0s for the Kubernetes API at https://api.jimaupi.qe.devcluster.openshift.com:6443... level=info msg=API v1.20.0+70dd98e up level=info msg=Waiting up to 30m0s for bootstrapping to complete... level=info msg=Use the following commands to gather logs from the cluster level=info msg=openshift-install gather bootstrap --help level=fatal msg=failed to wait for bootstrapping to complete: timed out waiting for the condition On control node kubelet log, found below errors are reported repeatedly: Jan 26 08:54:17 control-plane-0 hyperkube[1574]: E0126 08:54:17.971852 1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? Jan 26 08:54:22 control-plane-0 hyperkube[1574]: E0126 08:54:22.972907 1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? Jan 26 08:54:27 control-plane-0 hyperkube[1574]: E0126 08:54:27.973599 1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? Version: $ openshift-install version 4.7.0-0.nightly-2021-01-25-175331 Platform: upi-on-vsphere behind https proxy What happened? Installation failed waiting for bootstrap completed What did you expect to happen? Cluster is installed successfully How to reproduce it (as minimally and precisely as possible)? Install upi-on-vsphere behind proxy Anything else we need to know? Issue is not reproduced on 4.7.0-0.nightly-2021-01-22-134922, but install twice on 4.7.0-0.nightly-2021-01-25-175331 and all are failed with same error.
Created attachment 1750876 [details] Attached bootstrap bundle log
After a bit more dig on this failed cluster, seem like this issue is introduced by https://github.com/openshift/machine-config-operator/pull/2342 [root@control-plane-2 ~]# grep -r "etc/mco/proxy.env" /etc/systemd/ /etc/systemd/system/machine-config-daemon-firstboot.service:EnvironmentFile=/etc/mco/proxy.env /etc/systemd/system/machine-config-daemon-pull.service:EnvironmentFile=/etc/mco/proxy.env /etc/systemd/system/nodeip-configuration.service:EnvironmentFile=/etc/mco/proxy.env /etc/systemd/system/pivot.service.d/10-mco-default-env.conf:EnvironmentFile=/etc/mco/proxy.env [root@control-plane-2 ~]# cd /etc/systemd/system/ [root@control-plane-2 system]# pwd /etc/systemd/system [root@control-plane-2 system]# ls kubelet.service kubelet.service kubelet.service.d/ kubelet.service.requires/ [root@control-plane-2 system]# ls kubelet.service* kubelet.service kubelet.service.d: 10-mco-default-env.conf 20-logging.conf kubelet.service.requires: machine-config-daemon-firstboot.service [root@control-plane-2 system]# cat kubelet.service.d/10-mco-default-env.conf [Service] Environment="GODEBUG=x509ignoreCN=0,madvdontneed=1" proxy configuration file is not dropped into kubelet service folder. [root@control-plane-2 manifests]# systemctl status machine-config-daemon-firstboot.service ● machine-config-daemon-firstboot.service - Machine Config Daemon Firstboot Loaded: loaded (/etc/systemd/system/machine-config-daemon-firstboot.service; enabled; vendor preset: enabled) Active: inactive (dead) Condition: start condition failed at Tue 2021-01-26 08:31:31 UTC; 5h 30min ago └─ ConditionPathExists=/etc/ignition-machine-config-encapsulated.json was not met [root@control-plane-2 manifests]# journalctl -f -u machine-config-daemon-firstboot.service -- Logs begin at Tue 2021-01-26 08:28:53 UTC. -- Jan 26 08:31:02 control-plane-2 machine-config-daemon[2138]: I0126 08:31:02.747194 2138 rpm-ostree.go:184] Current origin is not custom Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576186 2138 rpm-ostree.go:211] Pivoting to: 47.83.202101251242-0 (2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391) Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576501 2138 rpm-ostree.go:243] Executing rebase from repo path /run/mco-machine-os-content/os-content-492386407/srv/repo with customImageURL pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e52aead8f8025eb8fe12a385a826a822fa94f9dc89e8d55abcc2bbf718f4b11f and checksum 2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391 Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576553 2138 rpm-ostree.go:261] Running captured: rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-492386407/srv/repo:2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e52aead8f8025eb8fe12a385a826a822fa94f9dc89e8d55abcc2bbf718f4b11f --custom-origin-description Managed by machine-config-operator Jan 26 08:31:11 control-plane-2 machine-config-daemon[2138]: I0126 08:31:11.771676 2138 update.go:1858] Rebooting node Jan 26 08:31:11 control-plane-2 machine-config-daemon[2138]: I0126 08:31:11.776457 2138 update.go:1858] initiating reboot: Completing firstboot provisioning to rendered-master-e54b9be49bc54ed97eb6ff27e32f043d Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=killed, status=15/TERM Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'signal'. Jan 26 08:31:11 control-plane-2 systemd[1]: Stopped Machine Config Daemon Firstboot. Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Consumed 16.058s CPU time
Assuming the analysis in comment 3 is correct, I am moving this bug to MCO.
*** This bug has been marked as a duplicate of bug 1920027 ***