Description of problem: The release-openshift-origin-installer-e2e-gcp-upgrade-4.4 CI job consistently fails [1]. While investigating the cause of the failures, I see hundreds of instances of the following error: "network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network" I think this may be causing a cascading effect that results in a failure to maintain a functioning cluster during an upgrade. I found a similar bug [2], but maybe the fix needs a broader scope? Does [3] fix this issue? If so, it should be backported. See [4] for additional background. Version-Release number of selected component (if applicable): 4.4 How reproducible: The errors are observed in all e2e-gcp-upgrade-4.4 CI job failures. Steps to Reproduce: 1. See [1] 2. Pick a failed CI job. 3. Search for Missing CNI default network error message Actual results: Failed e2e-gcp-upgrade-4.4 job Expected results: Passed e2e-gcp-upgrade-4.4 job Additional info: [1] https://testgrid.k8s.io/redhat-openshift-ocp-release-4.4-informing#release-openshift-origin-installer-e2e-gcp-upgrade-4.4&sort-by-flakiness=&exclude-non-failed-tests=50 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1754154 [3] https://github.com/openshift/multus-cni/pull/54 [4] https://coreos.slack.com/archives/CDCP2LA9L/p1585683028151700
I believe the underlying issue may be related to [1], where operators are scheduling operands to nodes tainted as not ready. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1753059
verified this bug on 4.5.0-0.nightly-2020-04-14-221451 1. reboot the worker server. 2. after restarted and check the kubelet logs, no new error message 'Missing CNI default network" generated. journalctl -u kubelet | grep "Missing CNI default network"
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409