Bug 2040263 - [sig-network] pods should successfully create sandboxes by other
Summary: [sig-network] pods should successfully create sandboxes by other
Status: CLOSED DUPLICATE of bug 2038481
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.9
Hardware: s390x
OS: Linux
Target Milestone: ---
: ---
Assignee: jamo luhrsen
QA Contact: zhaozhanqi
Depends On:
TreeView+ depends on / blocked
Reported: 2022-01-13 10:38 UTC by Surender Yadav
Modified: 2022-03-31 22:44 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2022-03-31 22:44:42 UTC
Target Upstream Version:

Attachments (Terms of Use)

Description Surender Yadav 2022-01-13 10:38:25 UTC
Description of problem:

The s390x CI upgrade jobs (4.9 to 4.10) fails with errors "[sig-network] pods should successfully create sandboxes by other"

How reproducible:


Steps to Reproduce:


Actual results:
The test fails.

Expected results:
The test should pass.

Comment 1 Surender Yadav 2022-01-20 16:10:25 UTC
We are still observing job failures with this test.

Comment 2 Douglas Smith 2022-02-02 15:09:41 UTC
This looks like the default network wasn't ready. Sending along to SDN team for triage.

From the provided CI run:

ns/openshift-kube-scheduler pod/openshift-kube-scheduler-guard-libvirt-s390x-0-2-708-h989f-master-0 node/libvirt-s390x-0-2-708-h989f-master-0 - 828.97 seconds after deletion - reason/FailedCreatePodSandBox Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_openshift-kube-scheduler-guard-libvirt-s390x-0-2-708-h989f-master-0_openshift-kube-scheduler_8f7882c3-ff56-4cba-8624-d856020c678b_0(b7ded924f1a87d15bd0040c9d6f02fb559814800ab7a9ef9bba279d3c9192ff4): error adding pod openshift-kube-scheduler_openshift-kube-scheduler-guard-libvirt-s390x-0-2-708-h989f-master-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-kube-scheduler/openshift-kube-scheduler-guard-libvirt-s390x-0-2-708-h989f-master-0/8f7882c3-ff56-4cba-8624-d856020c678b]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/80-openshift-network.conf. pollimmediate error: timed out waiting for the condition

Comment 3 jamo luhrsen 2022-02-08 03:45:09 UTC
This is likely the same (or just a duplicate) as https://bugzilla.redhat.com/show_bug.cgi?id=2038481
notice the name of the pods in question are "guard" pods and the amount of time (e.g., 828.97 seconds)
it reports. Most likely these guard pods are getting drained as part of the reboot process for the
node in the upgrade, but they incorrectly come back to the node even though it's marked unschedulable.

there was some work done recently to fix 2038481 and I notice this bug was filed almost a month ago.
I looked at search.ci and sure I saw this specific error any more. @suryadav, can you
see if this is no longer happening? if so, you can just close this as a duplicate of 2038481. If it's
still happening I can try to help find the right team to get it resolved.

Comment 4 jamo luhrsen 2022-03-31 22:44:42 UTC
closing this as it was a duplicate of 2038481. the search provided in the description
will still turn up failures in this test case that looks for failed sandboxes issues,
but the handful I looked at just now were not because of the guard pods which were
fixed in 2038481

*** This bug has been marked as a duplicate of bug 2038481 ***

Note You need to log in before you can comment on or make changes to this bug.