Bug 2038386 - periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade
Summary: periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-...
Keywords:
Status: CLOSED DUPLICATE of bug 2038481
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: jamo luhrsen
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-07 19:03 UTC by Dennis Periquet
Modified: 2022-01-10 19:08 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-10 19:08:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Dennis Periquet 2022-01-07 19:03:56 UTC
In this job:
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

This test case:
  [sig-network] pods should successfully create sandboxes by other"

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

This job shows '33 failures to create the sandbox':
  https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade/1479445783269871616

I noticed that sometimes the number is different (e.g., 22 or 11 vs. 33).  Here's some sample output from the jobrun mentioned above:

33 failures to create the sandbox
  
ns/openshift-kube-controller-manager pod/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 376.91 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-controller-manager_3a2d0426-d4df-4faa-9b7f-47acb5466fda_0(5e0f557c05f0dcb455b548b1213756c0300b814e9961c692b7e9211fc323e4ee): error adding pod openshift-kube-controller-manager_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal/3a2d0426-d4df-4faa-9b7f-47acb5466fda]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
ns/openshift-kube-scheduler pod/openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 378.07 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-scheduler_8a09101d-03ba-41b8-a2c9-0c6522ddfeaa_0(db8ee507a85b161a5e42be89f5338c3892deae8ad51560a48ff24d3646442677): error adding pod openshift-kube-scheduler_openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-scheduler/openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal/8a09101d-03ba-41b8-a2c9-0c6522ddfeaa]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
...

Comment 1 Devan Goodwin 2022-01-07 19:09:48 UTC
Note that in testgrid, this looks like a flake, when in fact it's a deficiency in the test framework where we can't tell the difference between a test run separately in two invocations. We're working to correct that but it's a long path to fix, in the meantime assume that a flake in this test likely means one suite ran successfully, and one hard failed.

Comment 2 Douglas Smith 2022-01-07 19:49:30 UTC
Since it's still waiting on the default network, as indicated by: 

```
have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf
```

I'm going to move this over for the OVN folks to take a look at, thanks for the report.

Comment 3 jamo luhrsen 2022-01-10 19:08:58 UTC

*** This bug has been marked as a duplicate of bug 2038481 ***


Note You need to log in before you can comment on or make changes to this bug.