Description of problem: Creating ICNI2 SPK pods with right annotation is failing with the error -"failed to configure pod interface: error while waiting on flows for pod: timed out waiting for OVS flows" Pod annotations for ICNI2: apiVersion: v1 kind: Pod metadata: annotations: k8s.ovn.org/bfd-enabled: "true" k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.131.10.106/23"],"mac_address":"0a:58:0a:83:0a:6a","gateway_ips":["10.131.10.1"],"ip_address":"10.131.10.106/23","gateway_ip":"10.131.10.1"}}' k8s.ovn.org/routing-namespaces: served-ns-1 k8s.ovn.org/routing-network: serving-ns-1/sriov-net-ens2f0-1 k8s.v1.cni.cncf.io/networks: '[{ "name": "sriov-net-ens2f0-1", "ips": [ "192.168.217.1/21" ]}]' Logs from ovnkube-master pods, I1007 19:56:57.700178 1 pods.go:343] LSP already exists for port: serving-ns-1_pod-serving-1-1-serving-job 2021-10-07T19:56:57.709Z|26993|nbctl|INFO|Running command run -- add address_set fe5dbc40-9d1d-4328-b5b8-1fa97aacc6ab addresses "\"10.131.10.106\"" I1007 19:56:57.710079 1 egressgw.go:162] External gateway pod: pod-serving-1-1-serving-job, detected for namespace(s) served-ns-1 I1007 19:56:57.710119 1 pods.go:307] [serving-ns-1/pod-serving-1-1-serving-job] addLogicalPort took 10.569509ms E1007 19:56:57.710135 1 ovn.go:685] failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod pod-serving-1-1-serving-job: unexpected end of JSON input I1007 19:56:57.710188 1 ovn.go:631] [0582885e-d4fb-450a-9f28-4bc0e5c09f15/serving-ns-1/pod-serving-1-1-serving-job] setup retry failed; will try again later I1007 19:56:57.710295 1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"serving-ns-1", Name:"pod-serving-1-1-serving-job", UID:"0582885e-d4fb-450a-9f28-4bc0e5c09f15", APIVersion:"v1", ResourceVersion:"122713976", Field Path:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod pod-serving-1-1-serving-job: unexpected end of JSON input Version-Release number of selected component (if applicable): 4.7.28 How reproducible: Always Steps to Reproduce: 1. Create a healthy cluster 2. Create ICNI2 SPK pods with right annotation 3. Pods are stuck in ContainerCreating state Actual results: Pod creation stuck with error - timed out waiting for OVS flows Expected results: Pod should get created and BFD sessions should get established Additional info:
Upstream fix posted: https://github.com/ovn-org/ovn-kubernetes/pull/2551
We have verified this bug is not present on 4.9.7.
We also have verified this bug is not present on 4.10.0-0.nightly-2021-10-21-105053
Fix got in during the bug downstream merge that happened: https://github.com/openshift/ovn-kubernetes/pull/834. Moving state to MODIFIED.
I'm seeing this bug with 4.9.17 Pod creation get stuck in ContainerCreatin... # oc get pods NAME READY STATUS RESTARTS AGE ext-gw2 0/1 ContainerCreating 0 45s and these messages in the events 65s Normal Scheduled pod/ext-gw2 Successfully assigned frr/ext-gw2 to w1 65s Warning ErrorAddingLogicalPort pod/ext-gw2 failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod ext-gw2: unexpected end of JSON input 2s Warning FailedCreatePodSandBox pod/ext-gw2 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ext-gw2_frr_166d88a5-de73-4209-8fdf-58f0ab267132_0(c432d7e44ad3754997c28d1f3d0b302e3f65ef20edc67ff00c2a7b7bc2beca55): error adding pod frr_ext-gw2 to CNI network "multus-cni-network": [frr/ext-gw2/166d88a5-de73-4209-8fdf-58f0ab267132:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[frr/ext-gw2 c432d7e44ad3754997c28d1f3d0b302e3f65ef20edc67ff00c2a7b7bc2beca55] [frr/ext-gw2 c432d7e44ad3754997c28d1f3d0b302e3f65ef20edc67ff00c2a7b7bc2beca55] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:01:09 [10.131.1.9/23]...
openshift/ovn-kubernetes/commit/db62abff899d8a55dfa04a9a02167d28046c1bc1 was the fix
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056