Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2038386

Summary: periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade
Product: OpenShift Container Platform Reporter: Dennis Periquet <dperique>
Component: NetworkingAssignee: jamo luhrsen <jluhrsen>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: medium    
Priority: unspecified CC: bpickard, dgoodwin, sippy
Version: 4.9   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-01-10 19:08:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dennis Periquet 2022-01-07 19:03:56 UTC
In this job:
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

This test case:
  [sig-network] pods should successfully create sandboxes by other"

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

This job shows '33 failures to create the sandbox':
  https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade/1479445783269871616

I noticed that sometimes the number is different (e.g., 22 or 11 vs. 33).  Here's some sample output from the jobrun mentioned above:

33 failures to create the sandbox
  
ns/openshift-kube-controller-manager pod/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 376.91 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-controller-manager_3a2d0426-d4df-4faa-9b7f-47acb5466fda_0(5e0f557c05f0dcb455b548b1213756c0300b814e9961c692b7e9211fc323e4ee): error adding pod openshift-kube-controller-manager_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal/3a2d0426-d4df-4faa-9b7f-47acb5466fda]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
ns/openshift-kube-scheduler pod/openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 378.07 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-scheduler_8a09101d-03ba-41b8-a2c9-0c6522ddfeaa_0(db8ee507a85b161a5e42be89f5338c3892deae8ad51560a48ff24d3646442677): error adding pod openshift-kube-scheduler_openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-scheduler/openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal/8a09101d-03ba-41b8-a2c9-0c6522ddfeaa]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
...

Comment 1 Devan Goodwin 2022-01-07 19:09:48 UTC
Note that in testgrid, this looks like a flake, when in fact it's a deficiency in the test framework where we can't tell the difference between a test run separately in two invocations. We're working to correct that but it's a long path to fix, in the meantime assume that a flake in this test likely means one suite ran successfully, and one hard failed.

Comment 2 Douglas Smith 2022-01-07 19:49:30 UTC
Since it's still waiting on the default network, as indicated by: 

```
have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf
```

I'm going to move this over for the OVN folks to take a look at, thanks for the report.

Comment 3 jamo luhrsen 2022-01-10 19:08:58 UTC

*** This bug has been marked as a duplicate of bug 2038481 ***