Description of problem: Create route perf scale job in OCP heterogeneous cluster with 120 OVN network worker nodes Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. Scale cluster to 120 worker nodes 2. Execute route-perf job Actual results: A log of pod failed to created Expected results: The job can executed successfully. Additional info: some OVN pod logs/information Warning NetworkNotReady 84m (x18 over 85m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? Warning FailedCreatePodSandBox 79m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-check-target-nq7cv_openshift-network-diagnostics_fb379484-8a04-4773-b58b-d67a442e5591_0(5842cecac99e2ae780dff8e6a6f2646c6d40b4de5c8ecae95adb4302e31c3b67): error adding pod openshift-network-diagnostics_network-check-target-nq7cv to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-network-diagnostics/network-check-target-nq7cv/fb379484-8a04-4773-b58b-d67a442e5591:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-network-diagnostics/network-check-target-nq7cv 5842cecac99e2ae780dff8e6a6f2646c6d40b4de5c8ecae95adb4302e31c3b67] [openshift-network-diagnostics/network-check-target-nq7cv 5842cecac99e2ae780dff8e6a6f2646c6d40b4de5c8ecae95adb4302e31c3b67] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:1c:03 [10.131.28.3/23] ' oc logs -f ovnkube-node-lk8xs -n openshift-ovn-kubernetes -c ovn-controller 2022-06-28T12:39:21.317Z|00813|ovsdb_cs|INFO|ssl:10.0.145.48:9642: clustered database server is disconnected from cluster; trying another server 2022-06-28T12:39:21.318Z|00814|main|INFO|OVNSB commit failed, force recompute next time. 2022-06-28T12:39:21.318Z|00815|reconnect|INFO|ssl:10.0.145.48:9642: connection attempt timed out 2022-06-28T12:39:21.320Z|00816|reconnect|INFO|ssl:10.0.207.143:9642: connecting... 2022-06-28T12:39:21.756Z|00817|reconnect|INFO|ssl:10.0.207.143:9642: connected 2022-06-28T12:39:22.288Z|00818|inc_proc_eng|INFO|node: logical_flow_output, recompute (forced) took 517ms 116m Warning FailedCreatePodSandBox pod/network-metrics-daemon-tzz7j Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-metrics-daemon-tzz7j_openshift-multus_dfff197d-3cb0-4fd2-9bc3-72901e18a2a9_0(317f24ba0684e7fa6b1f07d92d7b3f3bce215adcee2dcfcb6f35cd7960ce6e4e): error adding pod openshift-multus_network-metrics-daemon-tzz7j to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-multus/network-metrics-daemon-tzz7j/dfff197d-3cb0-4fd2-9bc3-72901e18a2a9:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-multus/network-metrics-daemon-tzz7j 317f24ba0684e7fa6b1f07d92d7b3f3bce215adcee2dcfcb6f35cd7960ce6e4e] [openshift-multus/network-metrics-daemon-tzz7j 317f24ba0684e7fa6b1f07d92d7b3f3bce215adcee2dcfcb6f35cd7960ce6e4e] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:81:1c:04 [10.129.28.4/23] ' oc get pods -n openshift-multus -owide |grep ip-10-0-169-235.us-east-2.compute.internal multus-additional-cni-plugins-t4pgq 1/1 Running 0 127m 10.0.169.235 ip-10-0-169-235.us-east-2.compute.internal <none> <none> multus-m5scr 1/1 Running 0 127m 10.0.169.235 ip-10-0-169-235.us-east-2.compute.internal <none> <none> network-metrics-daemon-dlh5h 0/2 ContainerCreating 0 127m <none> ip-10-0-169-235.us-east-2.compute.internal <none> <none> oc logs -f multus-m5scr -n openshift-multus 2022-06-28T11:10:09+00:00 [cnibincopy] Successfully copied files in /usr/src/multus-cni/rhel8/bin/ to /host/opt/cni/bin/upgrade_4616f50f-0485-43ce-a4e8-1ae8b0bfb069 2022-06-28T11:10:09+00:00 [cnibincopy] Successfully moved files in /host/opt/cni/bin/upgrade_4616f50f-0485-43ce-a4e8-1ae8b0bfb069 to /host/opt/cni/bin/ 2022-06-28T11:10:09+00:00 WARN: {unknown parameter "-"} 2022-06-28T11:10:09+00:00 Entrypoint skipped copying Multus binary. 2022-06-28T11:10:09+00:00 Generating Multus configuration file using files in /host/var/run/multus/cni/net.d... 2022-06-28T11:10:09+00:00 Attempting to find master plugin configuration, attempt 0 2022-06-28T11:10:14+00:00 Attempting to find master plugin configuration, attempt 5 2022-06-28T11:10:18+00:00 Using MASTER_PLUGIN: 10-ovn-kubernetes.conf 2022-06-28T11:10:18+00:00 Nested capabilities string: 2022-06-28T11:10:18+00:00 Using /host/var/run/multus/cni/net.d/10-ovn-kubernetes.conf as a source to generate the Multus configuration 2022-06-28T11:10:18+00:00 Config file created @ /host/etc/cni/net.d/00-multus.conf { "cniVersion": "0.3.1", "name": "multus-cni-network", "type": "multus", "namespaceIsolation": true, "globalNamespaces": "default,openshift-multus,openshift-sriov-network-operator", "logLevel": "verbose", "binDir": "/opt/multus/bin", "readinessindicatorfile": "/var/run/multus/cni/net.d/10-ovn-kubernetes.conf", "kubeconfig": "/etc/kubernetes/cni/net.d/multus.d/multus.kubeconfig", "delegates": [ {"cniVersion":"0.4.0","name":"ovn-kubernetes","type":"ovn-k8s-cni-overlay","ipam":{},"dns":{},"logFile":"/var/log/ovn-kubernetes/ovn-k8s-cni-overlay.log","logLevel":"4","logfile-maxsize":100,"logfile-maxbackups":5,"logfile-maxage":5} ] } 2022-06-28T11:10:18+00:00 Entering watch loop... oc describe pod network-metrics-daemon-dlh5h -n openshift-multus Name: network-metrics-daemon-dlh5h Namespace: openshift-multus Priority: 2000001000 Priority Class Name: system-node-critical Node: ip-10-0-169-235.us-east-2.compute.internal/10.0.169.235 Start Time: Tue, 28 Jun 2022 11:09:58 +0000 Labels: app=network-metrics-daemon component=network controller-revision-hash=55897ff588 openshift.io/component=network pod-template-generation=1 type=infra Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.129.18.4/23"],"mac_address":"0a:58:0a:81:12:04","gateway_ips":["10.129.18.1"],"ip_address":"10.129.18.4/23... Status: Pending IP: IPs: <none> Controlled By: DaemonSet/network-metrics-daemon Containers: network-metrics-daemon: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c66bc33d936ebe73656b85f4dd498266880ebbd10ba4210586109b36e36a6258 Image ID: Port: <none> Host Port: <none> Command: /usr/bin/network-metrics Args: --node-name $(NODE_NAME) State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 100Mi Environment: NODE_NAME: (v1:spec.nodeName) Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-496mw (ro) kube-rbac-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b618601c08f13a78710a25221deeb31f4a9281acb0947e3c269984cff706d932 Image ID: Port: 8443/TCP Host Port: 0/TCP Args: --logtostderr --secure-listen-address=:8443 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256 --upstream=http://127.0.0.1:9091/ --tls-private-key-file=/etc/metrics/tls.key --tls-cert-file=/etc/metrics/tls.crt State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 10m memory: 20Mi Environment: <none> Mounts: /etc/metrics from metrics-certs (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-496mw (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: metrics-certs: Type: Secret (a volume populated by a Secret) SecretName: metrics-daemon-secret Optional: false kube-api-access-496mw: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: op=Exists Events: Type Reason Age From Message -------------------------------------------------------------------------------- Normal Scheduled 130m default-scheduler Successfully assigned openshift-multus/network-metrics-daemon-dlh5h to ip-10-0-169-235.us-east-2.compute.internal by ip-10-0-145-48 Warning ErrorAddingLogicalPort 130m (x3 over 130m) controlplane addLogicalPort failed for openshift-multus/network-metrics-daemon-dlh5h: unable to parse node L3 gw annotation: k8s.ovn.org/l3-gateway-config annotation not found for node "ip-10-0-169-235.us-east-2.compute.internal" Warning FailedMount 130m (x6 over 130m) kubelet MountVolume.SetUp failed for volume "metrics-certs" : object "openshift-multus"/"metrics-daemon-secret" not registered Warning NetworkNotReady 130m (x11 over 130m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: No CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started? Warning FailedCreatePodSandBox 127m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-metrics-daemon-dlh5h_openshift-multus_4eec4f3e-0216-493c-8e24-6b71ef3133e0_0(869809c73e7f3303f68e9b9b11a5c1432b59bba7c72d525099109ed6ad43f3d4): error adding pod openshift-multus_network-metrics-daemon-dlh5h to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-multus/network-metrics-daemon-dlh5h/4eec4f3e-0216-493c-8e24-6b71ef3133e0:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-multus/network-metrics-daemon-dlh5h 869809c73e7f3303f68e9b9b11a5c1432b59bba7c72d525099109ed6ad43f3d4] [openshift-multus/network-metrics-daemon-dlh5h 869809c73e7f3303f68e9b9b11a5c1432b59bba7c72d525099109ed6ad43f3d4] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:81:12:04 [10.129.18.4/23] ' Warning FailedCreatePodSandBox 125m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_network-metrics-daemon-dlh5h_openshift-multus_4eec4f3e-0216-493c-8e24-6b71ef3133e0_0(7194779e01eff0316e777760d9cea6c8dc59f0178e7a98adddfa26bbdf7398fa): error adding pod openshift-multus_network-metrics-daemon-dlh5h to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-multus/network-metrics-daemon-dlh5h/4eec4f3e-0216-493c-8e24-6b71ef3133e0:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-multus/network-metrics-daemon-dlh5h 7194779e01eff0316e777760d9cea6c8dc59f0178e7a98adddfa26bbdf7398fa] [openshift-multus/network-metrics-daemon-dlh5h 7194779e01eff0316e777760d9cea6c8dc59f0178e7a98adddfa26bbdf7398fa] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:81:12:04 [10.129.18.4/23] ' oc logs -f ovnkube-node-vgkzv -n openshift-ovn-kubernetes -c ovn-controller 2022-06-28T11:33:10.039Z|00504|poll_loop|INFO|Dropped 23 log messages in last 78 seconds (most recently, 77 seconds ago) due to excessive rate 2022-06-28T11:33:10.039Z|00505|poll_loop|INFO|wakeup due to [POLLIN] on fd 16 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (99% CPU usage) 2022-06-28T11:33:10.039Z|00506|poll_loop|INFO|wakeup due to [POLLIN][POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:10.090Z|00507|poll_loop|INFO|wakeup due to [POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:10.107Z|00508|poll_loop|INFO|wakeup due to [POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:10.112Z|00509|poll_loop|INFO|wakeup due to [POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:10.115Z|00510|poll_loop|INFO|wakeup due to [POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:10.119Z|00511|poll_loop|INFO|wakeup due to [POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:10.122Z|00512|poll_loop|INFO|wakeup due to [POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:10.125Z|00513|poll_loop|INFO|wakeup due to [POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:10.129Z|00514|poll_loop|INFO|wakeup due to [POLLOUT] on fd 21 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (99% CPU usage) 2022-06-28T11:33:39.546Z|00515|lflow_cache|INFO|Detected cache inactivity (last active 30006 ms ago): trimming cache 2022-06-28T11:34:53.377Z|00516|ovsdb_cs|INFO|ssl:10.0.172.17:9642: clustered database server is disconnected from cluster; trying another server 2022-06-28T11:34:53.379Z|00517|reconnect|INFO|ssl:10.0.172.17:9642: connection attempt timed out 2022-06-28T11:34:53.380Z|00518|main|INFO|OVNSB commit failed, force recompute next time. Last State: Terminated Reason: Error Message: Cert: CACert: CertCommonName: Scheme: ElectionTimer:0 northbound:false exec:<nil>} Gateway:{Mode:shared Interface: EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 DisablePacketMTUCheck:false RouterSubnet:} MasterHA:{ElectionLeaseDuration:60 ElectionRenewDeadline:30 ElectionRetryPeriod:20} HybridOverlay:{Enabled:false RawClusterSubnets: ClusterSubnets:[] VXLANPort:4789} OvnKubeNode:{Mode:full MgmtPortNetdev: DisableOVNIfaceIdVer:false}} I0628 10:31:02.316819 1 client.go:325] "msg"="trying to connect" "database"="OVN_Northbound" "endpoint"="ssl:10.0.145.48:9641" I0628 10:31:02.328649 1 client.go:781] "msg"="transacting operations" "database"="_Server" "operations"="[{Op:select Table:Database Row:map[] Rows:[] Columns:[name model leader sid] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]" I0628 10:31:02.329575 1 client.go:260] "msg"="successfully connected" "database"="OVN_Northbound" "endpoint"="ssl:10.0.145.48:9641" "sid"="aa5bc698-feb5-4bb3-9470-7be6df5b7f15" I0628 10:31:02.332711 1 client.go:325] "msg"="trying to connect" "database"="OVN_Southbound" "endpoint"="ssl:10.0.145.48:9642" I0628 10:31:02.333475 1 client.go:325] "msg"="trying to connect" "database"="OVN_Southbound" "endpoint"="ssl:10.0.172.17:9642" I0628 10:31:02.333577 1 client.go:325] "msg"="trying to connect" "database"="OVN_Southbound" "endpoint"="ssl:10.0.207.143:9642" F0628 10:31:02.334001 1 ovnkube.go:133] error when trying to initialize libovsdb SB client: unable to connect to any endpoints: failed to connect to ssl:10.0.145.48:9642: failed to open connection: dial tcp 10.0.145.48:9642: connect: connection refused. failed to connect to ssl:10.0.172.17:9642: failed to open connection: dial tcp 10.0.172.17:9642: connect: connection refused. failed to connect to ssl:10.0.207.143:9642: failed to open connection: dial tcp 10.0.207.143:9642: connect: connection refused