Hide Forgot
Description of problem: On churning ICNI2 density workload on 4.10 openshift causes pod to stuck on `containerCreating` state forever with an error event says - Warning ErrorAddingLogicalPort 53s (x9 over 13m) controlplane failed to handle external GW check: unable to add pod: serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52 as external gateway for namespace: served-ns-33, error: duplicate IP found in ECMP Pod route cache! IP: "192.168.222.33", first pod: "dep-serving-33-4-serving-job-66f7948f5d-dzx52", second pod: "serving-ns-33_dep-serving-33-4-serving-job-66f7948f5d-dzx52" Version-Release number of selected component (if applicable): 4.10.0-rc.6 ovn21.12-central-21.12.0-25.el8fdp.x86_64 ovn21.12-vtep-21.12.0-25.el8fdp.x86_64 ovn21.12-21.12.0-25.el8fdp.x86_64 ovn21.12-host-21.12.0-25.el8fdp.x86_64 How reproducible: Always on churning ICNI2 workload Steps to Reproduce: 1. Deploy a healthy cluster on baremetal node(in this case 120 node) on OVN shared gateway mode 2. Create ICNI2 workload with serving and served pods, verify all BFD sessions and ECMP routes 3. Delete and recreate a couple of times and restart ovn master pods for a leader change 4. Re-create ICNI2 workload and watch for stuck pods or `ErrorAddingLogicalPort` message in OVN master logs Actual results: Pod stuck on adding logical port due to duplicate IP on ECMP pods route Expected results: Pod should be running like normal app pod without any error Additional info: This happened only after restarting OVN master pods.
Spoke to Tim, as we can see from the logs below, we add the key to cache and then the actual transact happens much later, and if that fails, we will retry pod setup and then we retry adding this pod to cache and we hit the duplicate podIP error. Tim might be able to explain it better, assigning this to him, we had a few options for fixing but considering this situation is not practically possible that often in user env, we went with the simplest fix to check the case of same pod retry happening. ==============================Round 1================================================ I0303 15:45:44.617308 1 pods.go:332] [serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52] creating logical port for pod on switch worker003-fc640 I0303 15:45:44.617409 1 kube.go:73] Setting annotations map[k8s.ovn.org/pod-networks:{"default":{"ip_addresses":["10.128.16.48/23"],"mac_address":"0a:58:0a:80:10:30","gateway_ips":["10.128.16.1"],"ip_address":"10.128.16.48/23","gateway_ip":"10.128.16.1"}}] on pod serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52 == I0303 15:45:44.629398 1 egressgw.go:155] External gateway pod: dep-serving-33-4-serving-job-66f7948f5d-dzx52, detected for namespace(s) served-ns-33 I0303 15:45:44.629444 1 egressgw.go:203] Adding routes for external gateway pod: dep-serving-33-4-serving-job-66f7948f5d-dzx52, next hops: "192.168.222.33", namespace: served-ns-33, bfd-enabled: true == 2022-03-03T15:45:44.630Z|00065|ovsdb_cs|INFO|ssl:192.168.216.12:9641: clustered database server is not cluster leader; trying another server 2022-03-03T15:45:44.630Z|00066|reconnect|INFO|ssl:192.168.216.12:9641: connection attempt timed out 2022-03-03T15:45:44.630Z|00067|reconnect|INFO|ssl:192.168.216.11:9641: connecting... 2022-03-03T15:45:44.634Z|00068|reconnect|INFO|ssl:192.168.216.11:9641: connected ==== I0303 15:45:44.732939 1 client.go:1053] "msg"="endpoint lost leader, reconnecting" "database"="OVN_Northbound" "endpoint"="ssl:192.168.216.12:9641" "sid"="d627d068-0e72-4397-80f4-22b922bee61e" I0303 15:45:44.733110 1 client.go:1137] "msg"="connection lost, reconnecting" "database"="OVN_Northbound" "endpoint"="ssl:192.168.216.10:9641" I0303 15:45:44.733660 1 transact.go:41] Configuring OVN: [{Op:insert Table:BFD Row:map[dst_ip:192.168.222.33 logical_port:rtoe-GR_worker023-fc640] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996875} {Op:insert Table:Logical_Router_Static_Route Row:map[bfd:{GoSet:[{GoUUID:u2596996875}]} ip_prefix:10.130.30.8/32 nexthop:192.168.222.33 options:{GoMap:map[ecmp_symmetric_reply:true]} output_port:{GoSet:[rtoe-GR_worker023-fc640]} policy:{GoSet:[src-ip]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996876} {Op:mutate Table:Logical_Router Row:map[] Rows:[] Columns:[] Mutations:[{Column:static_routes Mutator:insert Value:{GoSet:[{GoUUID:u2596996876}]}}] Timeout:<nil> Where:[where column _uuid == {37882d04-18f4-4b88-b87e-0d245145be44}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}] === 2022-03-03T15:45:52.863Z|00069|ovsdb_cs|INFO|ssl:192.168.216.11:9641: clustered database server is not cluster leader; trying another server 2022-03-03T15:45:52.863Z|00070|reconnect|INFO|ssl:192.168.216.11:9641: connection attempt timed out 2022-03-03T15:45:52.863Z|00071|reconnect|INFO|ssl:192.168.216.10:9641: connecting... 2022-03-03T15:45:52.869Z|00072|reconnect|INFO|ssl:192.168.216.10:9641: connected 2022-03-03T15:45:52.870Z|00073|ovsdb_cs|INFO|ssl:192.168.216.10:9641: clustered database server is not cluster leader; trying another server 2022-03-03T15:45:52.870Z|00074|reconnect|INFO|ssl:192.168.216.10:9641: connection attempt timed out 2022-03-03T15:45:52.870Z|00075|reconnect|INFO|ssl:192.168.216.12:9641: connecting... 2022-03-03T15:45:52.874Z|00076|reconnect|INFO|ssl:192.168.216.12:9641: connected I0303 15:46:04.734211 1 pods.go:322] [serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52] addLogicalPort took 20.116904396s ==== E0303 15:46:04.734238 1 ovn.go:648] failed to handle external GW check: unable to add src-ip route to GR router, err: error in transact with ops [{Op:insert Table:BFD Row:map[dst_ip:192.168.222.33 logical_port:rtoe-GR_worker023-fc640] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996875} {Op:insert Table:Logical_Router_Static_Route Row:map[bfd:{GoSet:[{GoUUID:u2596996875}]} ip_prefix:10.130.30.8/32 nexthop:192.168.222.33 options:{GoMap:map[ecmp_symmetric_reply:true]} output_port:{GoSet:[rtoe-GR_worker023-fc640]} policy:{GoSet:[src-ip]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996876} {Op:mutate Table:Logical_Router Row:map[] Rows:[] Columns:[] Mutations:[{Column:static_routes Mutator:insert Value:{GoSet:[{GoUUID:u2596996876}]}}] Timeout:<nil> Where:[where column _uuid == {37882d04-18f4-4b88-b87e-0d245145be44}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]: context deadline exceeded: while awaiting reconnection =========================================Round 2=================================== I0303 15:46:04.734287 1 pods.go:332] [serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52] creating logical port for pod on switch worker003-fc640 I0303 15:46:04.734406 1 client.go:1248] "msg"="warning: unable to ensure cache consistency for reading" "database"="OVN_Northbound" I0303 15:46:04.734408 1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"serving-ns-33", Name:"dep-serving-33-4-serving-job-66f7948f5d-dzx52", UID:"1b25bb96-66f8-401d-b063-1c90eec18ad5", APIVersion:"v1", ResourceVersion:"5071427", FieldPath:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' failed to handle external GW check: unable to add src-ip route to GR router, err: error in transact with ops [{Op:insert Table:BFD Row:map[dst_ip:192.168.222.33 logical_port:rtoe-GR_worker023-fc640] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996875} {Op:insert Table:Logical_Router_Static_Route Row:map[bfd:{GoSet:[{GoUUID:u2596996875}]} ip_prefix:10.130.30.8/32 nexthop:192.168.222.33 options:{GoMap:map[ecmp_symmetric_reply:true]} output_port:{GoSet:[rtoe-GR_worker023-fc640]} policy:{GoSet:[src-ip]}] Rows:[] Columns:[] Mutations:[] Timeout:<nil> Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:u2596996876} {Op:mutate Table:Logical_Router Row:map[] Rows:[] Columns:[] Mutations:[{Column:static_routes Mutator:insert Value:{GoSet:[{GoUUID:u2596996876}]}}] Timeout:<nil> Where:[where column _uuid == {37882d04-18f4-4b88-b87e-0d245145be44}] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]: context deadline exceeded: while awaiting reconnection ========= I0303 15:46:05.162958 1 egressgw.go:155] External gateway pod: dep-serving-33-4-serving-job-66f7948f5d-dzx52, detected for namespace(s) served-ns-33 I0303 15:46:05.162994 1 pods.go:322] [serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52] addLogicalPort took 428.709552ms E0303 15:46:05.163003 1 ovn.go:648] failed to handle external GW check: unable to add pod: serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52 as external gateway for namespace: served-ns-33, error: duplicate IP found in ECMP Pod route cache! IP: "192.168.222.33", first pod: "dep-serving-33-4-serving-job-66f7948f5d-dzx52", second pod: "serving-ns-33_dep-serving-33-4-serving-job-66f7948f5d-dzx52" I0303 15:46:05.163010 1 egressgw.go:155] External gateway pod: dep-serving-33-2-serving-job-dc6c6d4c8-9jnnr, detected for namespace(s) served-ns-33 I0303 15:46:05.163043 1 pods.go:332] [serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52] creating logical port for pod on switch worker003-fc640 I0303 15:46:05.163108 1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"serving-ns-33", Name:"dep-serving-33-4-serving-job-66f7948f5d-dzx52", UID:"1b25bb96-66f8-401d-b063-1c90eec18ad5", APIVersion:"v1", ResourceVersion:"5071434", FieldPath:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' failed to handle external GW check: unable to add pod: serving-ns-33/dep-serving-33-4-serving-job-66f7948f5d-dzx52 as external gateway for namespace: served-ns-33, error: duplicate IP found in ECMP Pod route cache! IP: "192.168.222.33", first pod: "dep-serving-33-4-serving-job-66f7948f5d-dzx52", second pod: "serving-ns-33_dep-serving-33-4-serving-job-66f7948f5d-dzx52"
(In reply to Surya Seetharaman from comment #2) > Spoke to Tim, as we can see from the logs below, we add the key to cache and > then the actual transact happens much later, and if that fails, we will > retry pod setup and then we retry adding this pod to cache and we hit the > duplicate podIP error. > > Tim might be able to explain it better, assigning this to him, we had a few > options for fixing but considering this situation is not practically > possible that often in user env, we went with the simplest fix to check the > case of same pod retry happening. > Wait I take this back actually, my understanding is if we fail for whatever reason during an addLogicalPort of exgw serving pod, we'd hit this bug i.e any adds of routes/policies/bfd post that add to cache event. Maybe its not a "rare" event :)
Tested this patch(https://github.com/openshift/ovn-kubernetes/pull/990) and it worked. Able to churn ICNI2 workload on 120 node environment without addLogicalPort error.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069