2038386 – periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

Bug 2038386 - periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

Summary: periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-...

Keywords:
Status:	CLOSED DUPLICATE of bug 2038481
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	jamo luhrsen
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-01-07 19:03 UTC by Dennis Periquet
Modified:	2022-01-10 19:08 UTC (History)
CC List:	3 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-01-10 19:08:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Dennis Periquet 2022-01-07 19:03:56 UTC

In this job:
  periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

This test case:
  [sig-network] pods should successfully create sandboxes by other"

is failing frequently in CI, see:
https://testgrid.k8s.io/redhat-openshift-ocp-release-4.10-informing#periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade

This job shows '33 failures to create the sandbox':
  https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.10-upgrade-from-stable-4.9-e2e-aws-ovn-upgrade/1479445783269871616

I noticed that sometimes the number is different (e.g., 22 or 11 vs. 33).  Here's some sample output from the jobrun mentioned above:

33 failures to create the sandbox
  
ns/openshift-kube-controller-manager pod/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 376.91 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-controller-manager_3a2d0426-d4df-4faa-9b7f-47acb5466fda_0(5e0f557c05f0dcb455b548b1213756c0300b814e9961c692b7e9211fc323e4ee): error adding pod openshift-kube-controller-manager_kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-controller-manager/kube-controller-manager-guard-ip-10-0-149-222.us-west-2.compute.internal/3a2d0426-d4df-4faa-9b7f-47acb5466fda]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
ns/openshift-kube-scheduler pod/openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal node/ip-10-0-149-222.us-west-2.compute.internal - 378.07 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal_openshift-kube-scheduler_8a09101d-03ba-41b8-a2c9-0c6522ddfeaa_0(db8ee507a85b161a5e42be89f5338c3892deae8ad51560a48ff24d3646442677): error adding pod openshift-kube-scheduler_openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-scheduler/openshift-kube-scheduler-guard-ip-10-0-149-222.us-west-2.compute.internal/8a09101d-03ba-41b8-a2c9-0c6522ddfeaa]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
...

Comment 1 Devan Goodwin 2022-01-07 19:09:48 UTC

Note that in testgrid, this looks like a flake, when in fact it's a deficiency in the test framework where we can't tell the difference between a test run separately in two invocations. We're working to correct that but it's a long path to fix, in the meantime assume that a flake in this test likely means one suite ran successfully, and one hard failed.

Comment 2 Douglas Smith 2022-01-07 19:49:30 UTC

Since it's still waiting on the default network, as indicated by: 

```
have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf
```

I'm going to move this over for the OVN folks to take a look at, thanks for the report.

Comment 3 jamo luhrsen 2022-01-10 19:08:58 UTC


*** This bug has been marked as a duplicate of bug 2038481 ***

Note You need to log in before you can comment on or make changes to this bug.