Bug 1920483 - Bootstrap completed failed: Network is not ready with error "No CNI configuration file in /etc/kubernetes/cni/net.d/" when running a cluster behind proxy.
Summary: Bootstrap completed failed: Network is not ready with error "No CNI configura...
Keywords:
Status: CLOSED DUPLICATE of bug 1920027
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Machine Config Operator
Version: 4.7
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.7.0
Assignee: Antonio Murdaca
QA Contact: Michael Nguyen
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-26 11:49 UTC by jima
Modified: 2021-01-27 22:55 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-27 22:55:32 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Attached bootstrap bundle log (3.94 MB, application/gzip)
2021-01-26 12:25 UTC, jima
no flags Details

Description jima 2021-01-26 11:49:26 UTC
Description:
Install cluster on upi-on-vsphere behind https proxy with 4.7.0-0.nightly-2021-01-25-175331, and bootstrap complete is failed:

+ '/home/slave7/workspace/Launch Environment Flexy/workdir/openshift-install' wait-for bootstrap-complete --dir '/home/slave7/workspace/Launch Environment Flexy/workdir/install-dir'
level=info msg=Waiting up to 20m0s for the Kubernetes API at https://api.jimaupi.qe.devcluster.openshift.com:6443...
level=info msg=API v1.20.0+70dd98e up
level=info msg=Waiting up to 30m0s for bootstrapping to complete...
level=info msg=Use the following commands to gather logs from the cluster
level=info msg=openshift-install gather bootstrap --help
level=fatal msg=failed to wait for bootstrapping to complete: timed out waiting for the condition

On control node kubelet log, found below errors are reported repeatedly:
Jan 26 08:54:17 control-plane-0 hyperkube[1574]: E0126 08:54:17.971852    1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
Jan 26 08:54:22 control-plane-0 hyperkube[1574]: E0126 08:54:22.972907    1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?
Jan 26 08:54:27 control-plane-0 hyperkube[1574]: E0126 08:54:27.973599    1574 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?


Version:

$ openshift-install version
4.7.0-0.nightly-2021-01-25-175331

Platform:
upi-on-vsphere behind https proxy

What happened?
Installation failed waiting for bootstrap completed

What did you expect to happen?
Cluster is installed successfully

How to reproduce it (as minimally and precisely as possible)?
Install upi-on-vsphere behind proxy

Anything else we need to know?
Issue is not reproduced on 4.7.0-0.nightly-2021-01-22-134922, but install twice on 4.7.0-0.nightly-2021-01-25-175331 and all are failed with same error.

Comment 2 jima 2021-01-26 12:25:41 UTC
Created attachment 1750876 [details]
Attached bootstrap bundle log

Comment 3 Johnny Liu 2021-01-26 14:15:15 UTC
After a bit more dig on this failed cluster, seem like this issue is introduced by https://github.com/openshift/machine-config-operator/pull/2342

[root@control-plane-2 ~]# grep -r "etc/mco/proxy.env" /etc/systemd/
/etc/systemd/system/machine-config-daemon-firstboot.service:EnvironmentFile=/etc/mco/proxy.env
/etc/systemd/system/machine-config-daemon-pull.service:EnvironmentFile=/etc/mco/proxy.env
/etc/systemd/system/nodeip-configuration.service:EnvironmentFile=/etc/mco/proxy.env
/etc/systemd/system/pivot.service.d/10-mco-default-env.conf:EnvironmentFile=/etc/mco/proxy.env


[root@control-plane-2 ~]# cd /etc/systemd/system/
[root@control-plane-2 system]# pwd
/etc/systemd/system
[root@control-plane-2 system]# ls kubelet.service
kubelet.service           kubelet.service.d/        kubelet.service.requires/ 
[root@control-plane-2 system]# ls kubelet.service*
kubelet.service

kubelet.service.d:
10-mco-default-env.conf  20-logging.conf

kubelet.service.requires:
machine-config-daemon-firstboot.service

[root@control-plane-2 system]# cat kubelet.service.d/10-mco-default-env.conf
[Service]
Environment="GODEBUG=x509ignoreCN=0,madvdontneed=1"

proxy configuration file is not dropped into kubelet service folder.

[root@control-plane-2 manifests]# systemctl status machine-config-daemon-firstboot.service
● machine-config-daemon-firstboot.service - Machine Config Daemon Firstboot
   Loaded: loaded (/etc/systemd/system/machine-config-daemon-firstboot.service; enabled; vendor preset: enabled)
   Active: inactive (dead)
Condition: start condition failed at Tue 2021-01-26 08:31:31 UTC; 5h 30min ago
           └─ ConditionPathExists=/etc/ignition-machine-config-encapsulated.json was not met

[root@control-plane-2 manifests]# journalctl -f -u machine-config-daemon-firstboot.service 
-- Logs begin at Tue 2021-01-26 08:28:53 UTC. --
Jan 26 08:31:02 control-plane-2 machine-config-daemon[2138]: I0126 08:31:02.747194    2138 rpm-ostree.go:184] Current origin is not custom
Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576186    2138 rpm-ostree.go:211] Pivoting to: 47.83.202101251242-0 (2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391)
Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576501    2138 rpm-ostree.go:243] Executing rebase from repo path /run/mco-machine-os-content/os-content-492386407/srv/repo with customImageURL pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e52aead8f8025eb8fe12a385a826a822fa94f9dc89e8d55abcc2bbf718f4b11f and checksum 2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391
Jan 26 08:31:04 control-plane-2 machine-config-daemon[2138]: I0126 08:31:04.576553    2138 rpm-ostree.go:261] Running captured: rpm-ostree rebase --experimental /run/mco-machine-os-content/os-content-492386407/srv/repo:2413c3249f9661b967a2cd9eef5822fe20b87b4b41bc4901721da9f5b6760391 --custom-origin-url pivot://quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e52aead8f8025eb8fe12a385a826a822fa94f9dc89e8d55abcc2bbf718f4b11f --custom-origin-description Managed by machine-config-operator
Jan 26 08:31:11 control-plane-2 machine-config-daemon[2138]: I0126 08:31:11.771676    2138 update.go:1858] Rebooting node
Jan 26 08:31:11 control-plane-2 machine-config-daemon[2138]: I0126 08:31:11.776457    2138 update.go:1858] initiating reboot: Completing firstboot provisioning to rendered-master-e54b9be49bc54ed97eb6ff27e32f043d
Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Main process exited, code=killed, status=15/TERM
Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Failed with result 'signal'.
Jan 26 08:31:11 control-plane-2 systemd[1]: Stopped Machine Config Daemon Firstboot.
Jan 26 08:31:11 control-plane-2 systemd[1]: machine-config-daemon-firstboot.service: Consumed 16.058s CPU time

Comment 4 Matthew Staebler 2021-01-27 14:28:27 UTC
Assuming the analysis in comment 3 is correct, I am moving this bug to MCO.

Comment 5 Ben Howard 2021-01-27 22:55:32 UTC

*** This bug has been marked as a duplicate of bug 1920027 ***


Note You need to log in before you can comment on or make changes to this bug.