Bug 2011971 - ICNI2 pods are stuck in ContainerCreating state
Summary: ICNI2 pods are stuck in ContainerCreating state
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: All
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.10.0
Assignee: Surya Seetharaman
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks: 2054139
TreeView+ depends on / blocked
 
Reported: 2021-10-07 20:02 UTC by Murali Krishnasamy
Modified: 2022-03-10 16:18 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2054139 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:18:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift/ovn-kubernetes/commit/db62abff899d8a55dfa04a9a02167d28046c1bc1 0 None None None 2022-02-07 17:00:43 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:18:31 UTC

Description Murali Krishnasamy 2021-10-07 20:02:56 UTC
Description of problem:
Creating ICNI2 SPK pods with right annotation is failing with the error -"failed to configure pod interface: error while waiting on flows for pod: timed out waiting for OVS flows"

Pod annotations for ICNI2:
apiVersion: v1
kind: Pod
metadata:
  annotations:
    k8s.ovn.org/bfd-enabled: "true"
    k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.131.10.106/23"],"mac_address":"0a:58:0a:83:0a:6a","gateway_ips":["10.131.10.1"],"ip_address":"10.131.10.106/23","gateway_ip":"10.131.10.1"}}'
    k8s.ovn.org/routing-namespaces: served-ns-1
    k8s.ovn.org/routing-network: serving-ns-1/sriov-net-ens2f0-1
    k8s.v1.cni.cncf.io/networks: '[{ "name": "sriov-net-ens2f0-1", "ips": [ "192.168.217.1/21" ]}]'

Logs from ovnkube-master pods,

I1007 19:56:57.700178       1 pods.go:343] LSP already exists for port: serving-ns-1_pod-serving-1-1-serving-job
2021-10-07T19:56:57.709Z|26993|nbctl|INFO|Running command run -- add address_set fe5dbc40-9d1d-4328-b5b8-1fa97aacc6ab addresses "\"10.131.10.106\""                                                                                          
I1007 19:56:57.710079       1 egressgw.go:162] External gateway pod: pod-serving-1-1-serving-job, detected for namespace(s) served-ns-1                                                                                                      
I1007 19:56:57.710119       1 pods.go:307] [serving-ns-1/pod-serving-1-1-serving-job] addLogicalPort took 10.569509ms
E1007 19:56:57.710135       1 ovn.go:685] failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod pod-serving-1-1-serving-job: unexpected end of JSON input                             
I1007 19:56:57.710188       1 ovn.go:631] [0582885e-d4fb-450a-9f28-4bc0e5c09f15/serving-ns-1/pod-serving-1-1-serving-job] setup retry failed; will try again later
I1007 19:56:57.710295       1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"serving-ns-1", Name:"pod-serving-1-1-serving-job", UID:"0582885e-d4fb-450a-9f28-4bc0e5c09f15", APIVersion:"v1", ResourceVersion:"122713976", Field
Path:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod pod-serving-1-1-serving-job: unexpected end of JSON input

Version-Release number of selected component (if applicable):
4.7.28

How reproducible:
Always

Steps to Reproduce:
1. Create a healthy cluster
2. Create ICNI2 SPK pods with right annotation
3. Pods are stuck in ContainerCreating state

Actual results:
Pod creation stuck with error - timed out waiting for OVS flows 

Expected results:
Pod should get created and BFD sessions should get established


Additional info:

Comment 1 Surya Seetharaman 2021-10-11 10:49:23 UTC
Upstream fix posted: https://github.com/ovn-org/ovn-kubernetes/pull/2551

Comment 3 Jose Castillo Lema 2021-11-16 14:41:38 UTC
We have verified this bug is not present on 4.9.7.

Comment 4 Jose Castillo Lema 2021-11-16 15:33:12 UTC
We also have verified this bug is not present on 4.10.0-0.nightly-2021-10-21-105053

Comment 5 Surya Seetharaman 2021-12-01 18:01:12 UTC
Fix got in during the bug downstream merge that happened: https://github.com/openshift/ovn-kubernetes/pull/834. Moving state to MODIFIED.

Comment 9 William Caban 2022-02-07 14:35:22 UTC
I'm seeing this bug with 4.9.17

Pod creation get stuck in ContainerCreatin...

# oc get pods
NAME      READY   STATUS              RESTARTS   AGE
ext-gw2   0/1     ContainerCreating   0          45s


and these messages in the events

65s         Normal    Scheduled                pod/ext-gw2   Successfully assigned frr/ext-gw2 to w1
65s         Warning   ErrorAddingLogicalPort   pod/ext-gw2   failed to handle external GW check: unable to unmarshall annotation k8s.v1.cni.cncf.io/network-status on pod ext-gw2: unexpected end of JSON input
2s          Warning   FailedCreatePodSandBox   pod/ext-gw2   Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_ext-gw2_frr_166d88a5-de73-4209-8fdf-58f0ab267132_0(c432d7e44ad3754997c28d1f3d0b302e3f65ef20edc67ff00c2a7b7bc2beca55): error adding pod frr_ext-gw2 to CNI network "multus-cni-network": [frr/ext-gw2/166d88a5-de73-4209-8fdf-58f0ab267132:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[frr/ext-gw2 c432d7e44ad3754997c28d1f3d0b302e3f65ef20edc67ff00c2a7b7bc2beca55] [frr/ext-gw2 c432d7e44ad3754997c28d1f3d0b302e3f65ef20edc67ff00c2a7b7bc2beca55] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:01:09 [10.131.1.9/23]...

Comment 11 Surya Seetharaman 2022-02-21 20:51:33 UTC
openshift/ovn-kubernetes/commit/db62abff899d8a55dfa04a9a02167d28046c1bc1 was the fix

Comment 13 errata-xmlrpc 2022-03-10 16:18:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.