In this job: periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade This test case: [sig-network] pods should successfully create sandboxes by other" is failing frequently in CI, see: https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade This job shows '33 failures to create the sandbox': https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade/1479445783269871616 I noticed that sometimes the number is different (e.g., 22 or 11 vs. 33). Here's some sample output from the jobrun mentioned above: 33 failures to create the sandbox ns/openshift-kube-controller-manager pod/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 376.91 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-controller-manager_3a2d0426-d4df-4faa-9b7f-47acb5466fda_0(5e0f557c05f0dcb455b548b1213756c0300b814e9961c692b7e9211fc323e4ee): error adding pod openshift-kube-controller-manager_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal/3a2d0426-d4df-4faa-9b7f-47acb5466fda]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition ns/openshift-kube-scheduler pod/openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 378.07 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-scheduler_8a09101d-03ba-41b8-a2c9-0c6522ddfeaa_0(db8ee507a85b161a5e42be89f5338c3892deae8ad51560a48ff24d3646442677): error adding pod openshift-kube-scheduler_openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-scheduler/openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal/8a09101d-03ba-41b8-a2c9-0c6522ddfeaa]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition ...
Note that in testgrid, this looks like a flake, when in fact it's a deficiency in the test framework where we can't tell the difference between a test run separately in two invocations. We're working to correct that but it's a long path to fix, in the meantime assume that a flake in this test likely means one suite ran successfully, and one hard failed.
Since it's still waiting on the default network, as indicated by: ``` have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf ``` I'm going to move this over for the OVN folks to take a look at, thanks for the report.
*** This bug has been marked as a duplicate of bug 2038481 ***