Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1920483

Summary: Bootstrap completed failed: Network is not ready with error "No CNI configuration file in /etc/kubernetes/cni/net.d/" when running a cluster behind proxy.
Product: OpenShift Container Platform Reporter: jima
Component: Machine Config OperatorAssignee: Antonio Murdaca <amurdaca>
Status: CLOSED DUPLICATE QA Contact: Michael Nguyen <mnguyen>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: behoward, jialiu, mstaeble, tsze, wking
Target Milestone: ---Keywords: Regression, TestBlocker
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-27 22:55:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Attached bootstrap bundle log none

Description jima 2021-01-26 11:49:26 UTC
Description:
Install cluster on upi-on-vsphere behind https proxy with 4.7.0-0.nightly-2021-01-25-175331, and bootstrap complete is failed:

+ '/home/slave7/workspace/Launch Environment Flexy/workdir/openshift-install' wait-for bootstrap-complete --dir '/home/slave7/workspace/Launch Environment Flexy/workdir/install-dir'
level=info msg=Waiting up to 20m0s for the Kubernetes API at https://api.jimaupi.qe.devcluster.openshift.com:6443...
level=info msg=API v1.20.0+70dd98e up
level=info msg=Waiting up to 30m0s for bootstrapping to complete...
level=info msg=Use the following commands to gather logs from the cluster
level=info msg=openshift-install gather bootstrap --help
level=fatal msg=failed to wait for bootstrapping to complete: timed out waiting for the condition

On control node kubelet log, found below errors are reported repeatedly:
Jan 26 08:54:17 control-plane-0 hyperkube[1574]: E0126 08:54:17.971852    1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
Jan 26 08:54:22 control-plane-0 hyperkube[1574]: E0126 08:54:22.972907    1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
Jan 26 08:54:27 control-plane-0 hyperkube[1574]: E0126 08:54:27.973599    1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?


Version:

$ openshift-install version
4.7.0-0.nightly-2021-01-25-175331

Platform:
upi-on-vsphere behind https proxy

What happened?
Installation failed waiting for bootstrap completed

What did you expect to happen?
Cluster is installed successfully

How to reproduce it (as minimally and precisely as possible)?
Install upi-on-vsphere behind proxy

Anything else we need to know?
Issue is not reproduced on 4.7.0-0.nightly-2021-01-22-134922, but install twice on 4.7.0-0.nightly-2021-01-25-175331 and all are failed with same error.

Comment 2 jima 2021-01-26 12:25:41 UTC
Created attachment 1750876 [details]
Attached bootstrap bundle log

Comment 3 Johnny Liu 2021-01-26 14:15:15 UTC
After a bit more dig on this failed cluster, seem like this issue is introduced by https://github.com/openshift/machine-config-operator/pull/2342

[root@control-plane-2 ~]# grep -r "etc/mco/proxy.env" /etc/systemd/
/etc/systemd/system/machine-config-daemon-firstboot.service:EnvironmentFile=/etc/mco/proxy.env
/etc/systemd/system/machine-config-daemon-pull.service:EnvironmentFile=/etc/mco/proxy.env
/etc/systemd/system/nodeip-configuration.service:EnvironmentFile=/etc/mco/proxy.env
/etc/systemd/system/pivot.service.d/10-mco-default-env.conf:EnvironmentFile=/etc/mco/proxy.env


[root@control-plane-2 ~]# cd /etc/systemd/system/
[root@control-plane-2 system]# pwd
/etc/systemd/system
[root@control-plane-2 system]# ls kubelet.service
kubelet.service           kubelet.service.d/        kubelet.service.requires/ 
[root@control-plane-2 system]# ls kubelet.service*
kubelet.service

kubelet.service.d:
10-mco-default-env.conf  20-logging.conf

kubelet.service.requires:
machine-config-daemon-firstboot.service

[root@control-plane-2 system]# cat kubelet.service.d/10-mco-default-env.conf
[Service]
Environment="GODEBUG=x509ignoreCN=0,madvdontneed=1"

proxy configuration file is not dropped into kubelet service folder.

[root@control-plane-2 manifests]# systemctl status machine-config-daemon-firstboot.service
● machine-config-daemon-firstboot.service - Machine Config Daemon Firstboot
   Loaded: loaded (/etc/systemd/system/machine-config-daemon-firstboot.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
Condition: start condition failed at Tue 2021-01-26 08:31:31 UTC; 5h 30min ago
           └─ ConditionPathExists=/etc/ignition-machine-config-encapsulated.json was not met

[root@control-plane-2 manifests]# journalctl -f -u machine-config-daemon-firstboot.service 
-- Logs begin at Tue 2021-01-26 08:28:53 UTC. --
Jan 26 08:31:02 control-plane-2 machine-config-daemon[2138]: I0126 08:31:02.747194    2138 rpm-ostree.go:184] Current origin is not custom
Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576186    2138 rpm-ostree.go:211] Pivoting to: 47.83.202101251242-0 (2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391)
Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576501    2138 rpm-ostree.go:243] Executing rebase from repo path /run/mco-machine-os-content/os-content-492386407/srv/repo with customImageURL pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e52aead8f8025eb8fe12a385a826a822fa94f9dc89e8d55abcc2bbf718f4b11f and checksum 2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391
Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576553    2138 rpm-ostree.go:261] Running captured: rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-492386407/srv/repo:2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e52aead8f8025eb8fe12a385a826a822fa94f9dc89e8d55abcc2bbf718f4b11f --custom-origin-description Managed by machine-config-operator
Jan 26 08:31:11 control-plane-2 machine-config-daemon[2138]: I0126 08:31:11.771676    2138 update.go:1858] Rebooting node
Jan 26 08:31:11 control-plane-2 machine-config-daemon[2138]: I0126 08:31:11.776457    2138 update.go:1858] initiating reboot: Completing firstboot provisioning to rendered-master-e54b9be49bc54ed97eb6ff27e32f043d
Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=killed, status=15/TERM
Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'signal'.
Jan 26 08:31:11 control-plane-2 systemd[1]: Stopped Machine Config Daemon Firstboot.
Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Consumed 16.058s CPU time

Comment 4 Matthew Staebler 2021-01-27 14:28:27 UTC
Assuming the analysis in comment 3 is correct, I am moving this bug to MCO.

Comment 5 Ben Howard 2021-01-27 22:55:32 UTC

*** This bug has been marked as a duplicate of bug 1920027 ***