Bug 2077900 - OCP 4.11 - error adding pod XXX to CNI network "multus-cni-network" (...) /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
Summary: OCP 4.11 - error adding pod XXX to CNI network "multus-cni-network" (...) /va...
Keywords:
Status: CLOSED DUPLICATE of bug 2078866
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Mohamed Mahmoud
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-22 14:24 UTC by Ramon Perez
Modified: 2022-05-06 15:53 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-05-06 15:53:13 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ramon Perez 2022-04-22 14:24:12 UTC
Description of problem:

After deploying successfully OCP 4.11 in a cluster, we are dealing with a lot of failures during subsequent deployments of different operators and workloads due to the following issue which involves ovn-kubernetes:

"error adding pod XXX to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-network-diagnostics/network-check-target-ncvhd/3bb3b05e-1ef5-496e-b67b-88a6aba42cbf]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition"

We use Ansible community.kubernetes.k8s module to automate the creation of the resources, and running twice or more causes this behavior. We use DCI to automate all the process.

The intention of this Bugzilla is to report this strange behavior, so that it can be quickly fixed in next OCP 4.11 versions released.

Version-Release number of selected component (if applicable):

OCP 4.11.0, starting from the version released the 2022-04-07 until 2022-04-16, which is the latest one tested.

How reproducible:

(Note that it is not happening all the times, and when happening, it may affect to different operators and/or workloads provided after the OCP installation. In Actual results and Additional information sections, I will extend the cases observed)

Steps to Reproduce:

1. Deploy OCP 4.11 in a cluster composed by 3 master nodes and 3 worker nodes, using IPI installation and Ansible playbooks from baremetal-deployment. For this purpose, DCI automation too has been used.

2. Deploy the operators and/or workloads to be used in the OCP cluster.

Actual results:

We are finding a lot of cases in which we can see pods that are having failures when being created due to the failure reported above: "error adding pod XXX to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [openshift-network-diagnostics/network-check-target-ncvhd/3bb3b05e-1ef5-496e-b67b-88a6aba42cbf]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition", and also, that there are cases in which several ovnkube-node pods remains in CrashLoopBackOff status:

The examples in which it has been happening this case are the following, all related to DCI jobs executing OCP 4.11:

- Installation of SRIOV operator: https://www.distributed-ci.io/jobs/9a0e5274-e1ca-407f-8d75-ff23ee75ce39/jobStates
- Application of a PerformanceProfile, waiting for MCP to be in the correct status: https://www.distributed-ci.io/jobs/7a658a2a-4e85-4e9e-969b-a1f69351a5fc/jobStates
- Deploying example-cnf, based on testpmd operators, even implying different operators: https://www.distributed-ci.io/jobs/1b6dcd1a-4496-4baa-89ca-46b00563d8f4/jobStates or https://www.distributed-ci.io/jobs/a66344a0-af2a-4420-8d7b-e15131ccde4e/jobStates
- Deploying a simple deployment based on a couple of pods: https://www.distributed-ci.io/jobs/b4aae3e1-2e37-480c-91f3-3f095b4292a9/jobStates
- Deploying an example of a basic operator: https://www.distributed-ci.io/jobs/07703e22-f6a6-49c9-8f9c-d1cf9276b028/jobStates

If you check all cases, you will observe that there are pods reporting the problem specified in the title. In Additional information, it is explained how to check the useful information to get to that point.

Expected results:

OCP cluster should be installed correctly and without OVN-related problems.

Additional info:

We were using Distributed CI (DCI) to do the tests. If you enter to the URLs provided, you will see the jobs executed to deploy OCP 4.11 with different operators and workloads.

If you move to the Files section, you can see the following files in order to get the information related to the failed pods:

- events.txt: it registers all the events related to the pods present in all namespaces. If you filter these files by the error message provided, you will see the high amount of pods that are affected by it.
- must_gather.tar.gz: complete must-gather copy of each OCP deployment, so that it is possible to retrieve more information regarding the cluster, namespaces, operators, etc.

Comment 2 Ramon Perez 2022-05-06 15:06:38 UTC
Just including a new comment to report that I have observed again this issue with OCP 4.11.0 released the 2022-04-26, in this particular case: https://www.distributed-ci.io/jobs/98343ab5-1435-43c3-b6cb-e684d69feb77/files

If you observe the status of the pods in the test-cnf namespace, you will see this kind of messages related to CNI network and so on: https://www.distributed-ci.io/files/ea0cb36c-fe50-4c9f-a242-c9f5216d602a. Some examples of the messages are:

4m39s       Warning   FailedCreatePodSandBox   pod/test-7c846c6f8b-9w2pd    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_test-7c846c6f8b-9w2pd_test-cnf_08451e2e-c9dd-4c1a-83c1-7ead097c1866_0(1e7155242b015219048435cbd8e511f64f4028f87ee31614b9a14570bfa6d857): error adding pod test-cnf_test-7c846c6f8b-9w2pd to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [test-cnf/test-7c846c6f8b-9w2pd/08451e2e-c9dd-4c1a-83c1-7ead097c1866]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
3m8s        Warning   FailedCreatePodSandBox   pod/test-7c846c6f8b-9w2pd    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_test-7c846c6f8b-9w2pd_test-cnf_08451e2e-c9dd-4c1a-83c1-7ead097c1866_0(7320efc8c5d6725952e78b8477f5a2530bf3d8b8b12b60de839b2cc54a9f4829): error adding pod test-cnf_test-7c846c6f8b-9w2pd to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [test-cnf/test-7c846c6f8b-9w2pd/08451e2e-c9dd-4c1a-83c1-7ead097c1866]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
97s         Warning   FailedCreatePodSandBox   pod/test-7c846c6f8b-9w2pd    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_test-7c846c6f8b-9w2pd_test-cnf_08451e2e-c9dd-4c1a-83c1-7ead097c1866_0(2e07d52855ec618f3084594a1aafd95861e2b183d1d5975e6726cb7daf410763): error adding pod test-cnf_test-7c846c6f8b-9w2pd to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [test-cnf/test-7c846c6f8b-9w2pd/08451e2e-c9dd-4c1a-83c1-7ead097c1866]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition
6s          Warning   FailedCreatePodSandBox   pod/test-7c846c6f8b-9w2pd    Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_test-7c846c6f8b-9w2pd_test-cnf_08451e2e-c9dd-4c1a-83c1-7ead097c1866_0(17a5fc61500dd64e2b4b7a579fabc75e2832e5b4c237b913ec556d70ef048b32): error adding pod test-cnf_test-7c846c6f8b-9w2pd to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): Multus: [test-cnf/test-7c846c6f8b-9w2pd/08451e2e-c9dd-4c1a-83c1-7ead097c1866]: have you checked that your default network is ready? still waiting for readinessindicatorfile @ /var/run/multus/cni/net.d/10-ovn-kubernetes.conf. pollimmediate error: timed out waiting for the condition

Comment 5 Mohamed Mahmoud 2022-05-06 15:53:13 UTC

*** This bug has been marked as a duplicate of bug 2078866 ***


Note You need to log in before you can comment on or make changes to this bug.