Bug 1821833 - Missing CNI default network Error
Summary: Missing CNI default network Error
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.5
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.5.0
Assignee: Douglas Smith
QA Contact: zhaozhanqi
URL:
Whiteboard:
Depends On:
Blocks: 1824936 1824938
TreeView+ depends on / blocked
 
Reported: 2020-04-07 17:10 UTC by Daneyon Hansen
Modified: 2020-07-13 17:26 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 1824936 1824938 (view as bug list)
Environment:
Last Closed: 2020-07-13 17:26:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1623 0 None closed Bug 1821833: The Multus CNI configuration file "00-multus.conf" should not be removed 2021-02-14 11:31:21 UTC
Red Hat Product Errata RHBA-2020:2409 0 None None None 2020-07-13 17:26:31 UTC

Description Daneyon Hansen 2020-04-07 17:10:55 UTC
Description of problem:
The release-openshift-origin-installer-e2e-gcp-upgrade-4.4 CI job consistently fails [1]. While investigating the cause of the failures, I see hundreds of instances of the following error:

"network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network"

I think this may be causing a cascading effect that results in a failure to maintain a functioning cluster during an upgrade. I found a similar bug [2], but maybe the fix needs a broader scope? Does [3] fix this issue? If so, it should be backported.

See [4] for additional background.

Version-Release number of selected component (if applicable):
4.4

How reproducible:
The errors are observed in all e2e-gcp-upgrade-4.4 CI job failures.

Steps to Reproduce:
1. See [1]
2. Pick a failed CI job.
3. Search for Missing CNI default network error message

Actual results:
Failed e2e-gcp-upgrade-4.4 job

Expected results:
Passed e2e-gcp-upgrade-4.4 job

Additional info:
[1] https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-origin-installer-e2e-gcp-upgrade-4.4&sort-by-flakiness=&exclude-non-failed-tests=50

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1754154

[3] https://github.com/openshift/multus-cni/pull/54

[4] https://coreos.slack.com/archives/CDCP2LA9L/p1585683028151700

Comment 2 Daneyon Hansen 2020-04-13 18:49:13 UTC
I believe the underlying issue may be related to [1], where operators are scheduling operands to nodes tainted as not ready.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1753059

Comment 7 zhaozhanqi 2020-04-15 10:04:55 UTC
verified this bug on 4.5.0-0.nightly-2020-04-14-221451

1. reboot the worker server. 
2. after restarted and check the kubelet logs, no new error message 'Missing CNI default network" generated. 
 journalctl -u kubelet | grep "Missing CNI default network"

Comment 8 errata-xmlrpc 2020-07-13 17:26:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409


Note You need to log in before you can comment on or make changes to this bug.