Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1821833

Summary:	Missing CNI default network Error
Product:	OpenShift Container Platform	Reporter:	Daneyon Hansen <dhansen>
Component:	Networking	Assignee:	Douglas Smith <dosmith>
Networking sub component:	openshift-sdn	QA Contact:	zhaozhanqi <zzhao>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	high
Priority:	unspecified	CC:	anbhat
Version:	4.5
Target Milestone:	---
Target Release:	4.5.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:
Clones:	1824936 1824938 (view as bug list)		Environment:
Last Closed:	2020-07-13 17:26:04 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1824936, 1824938

Description Daneyon Hansen 2020-04-07 17:10:55 UTC

Description of problem:
The release-openshift-origin-installer-e2e-gcp-upgrade-4.4 CI job consistently fails [1]. While investigating the cause of the failures, I see hundreds of instances of the following error:

"network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network"

I think this may be causing a cascading effect that results in a failure to maintain a functioning cluster during an upgrade. I found a similar bug [2], but maybe the fix needs a broader scope? Does [3] fix this issue? If so, it should be backported.

See [4] for additional background.

Version-Release number of selected component (if applicable):
4.4

How reproducible:
The errors are observed in all e2e-gcp-upgrade-4.4 CI job failures.

Steps to Reproduce:
1. See [1]
2. Pick a failed CI job.
3. Search for Missing CNI default network error message

Actual results:
Failed e2e-gcp-upgrade-4.4 job

Expected results:
Passed e2e-gcp-upgrade-4.4 job

Additional info:
[1] https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-origin-installer-e2e-gcp-upgrade-4.4&sort-by-flakiness=&exclude-non-failed-tests=50

[2] https://bugzilla.redhat.com/show_bug.cgi?id=1754154

[3] https://github.com/openshift/multus-cni/pull/54

[4] https://coreos.slack.com/archives/CDCP2LA9L/p1585683028151700

Comment 2 Daneyon Hansen 2020-04-13 18:49:13 UTC

I believe the underlying issue may be related to [1], where operators are scheduling operands to nodes tainted as not ready.


[1] https://bugzilla.redhat.com/show_bug.cgi?id=1753059

Comment 7 zhaozhanqi 2020-04-15 10:04:55 UTC

verified this bug on 4.5.0-0.nightly-2020-04-14-221451

1. reboot the worker server. 
2. after restarted and check the kubelet logs, no new error message 'Missing CNI default network" generated. 
 journalctl -u kubelet | grep "Missing CNI default network"

Comment 8 errata-xmlrpc 2020-07-13 17:26:04 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2409