An e2e test to verify both Kube and Openshift apiservers remain available during upgrade failed in a 4.4 to 4.4 test. A particular snippet that seemed to highlight the reason for the failure was: Jan 12 20:58:46.230 I ns/openshift-sdn daemonset/sdn-controller Deleted pod: sdn-controller-lkj6v Jan 12 20:58:46.230 I ns/openshift-sdn pod/sdn-7d6g9 Pulling image "registry.svc.ci.openshift.org/ci-op-jbtg7jjb/stable@sha256:f8de726661ce92ee52c4de8498a9f2868a4569b7ae62e59442d09ccbb78302b5" Jan 12 20:58:46.364 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network Jan 12 20:58:48.379 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (2 times) Jan 12 20:58:48.387 W ns/openshift-machine-api pod/cluster-autoscaler-operator-748f454f48-xlbsk network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network Jan 12 20:58:48.701 W ns/openshift-operator-lifecycle-manager pod/catalog-operator-86488444c-v4h5q Readiness probe failed: Get http://10.129.0.46:8080/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (2 times) Jan 12 20:58:49.097 W ns/openshift-apiserver pod/apiserver-zg25k Readiness probe failed: Get https://10.129.0.43:8443/healthz: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) (2 times) Jan 12 20:58:49.374 W ns/openshift-cluster-node-tuning-operator pod/cluster-node-tuning-operator-5c859c6585-kb6ph network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network Jan 12 20:58:49.742 W node/ip-10-0-157-152.ec2.internal condition Ready changed Jan 12 20:58:49.745 I node/ip-10-0-157-152.ec2.internal Node ip-10-0-157-152.ec2.internal status is now: NodeReady (2 times) Jan 12 20:58:49.882 I ns/openshift-machine-api machine/ci-op-jbtg7jjb-77109-dx8t6-worker-us-east-1b-tn6s4 Updated machine ci-op-jbtg7jjb-77109-dx8t6-worker-us-east-1b-tn6s4 (3 times) Jan 12 20:58:50.366 W ns/openshift-ingress-operator pod/ingress-operator-8c8c9579c-hph6g network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network Jan 12 20:58:50.373 W ns/openshift-controller-manager pod/controller-manager-g9tkl network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (3 times) Jan 12 20:58:50.381 W ns/openshift-machine-api pod/cluster-autoscaler-operator-748f454f48-xlbsk network is not ready: runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: Missing CNI default network (2 times) Which looks like: 1. openshift/sdn on a node is updated 2. 8-12 seconds later openshift-apiserver (on the pod network) fails readiness checks and is taken out of rotation At a first glance this would be a very serious bug if upgrading openshift-sdn caused a disruption to pods on the pod network https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade/14098
May be related to 1791162, but not sure.
*** This bug has been marked as a duplicate of bug 1785457 ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days