Created attachment 1841293 [details] ovnkube-master leader logs Description of problem: Pods were stuck in creating and crashloop. Describe of the pods indicated duplicate ECMP route errors and finally timed out waiting for OVS flows Warning FailedCreatePodSandBox 3m53s (x367 over 145m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_splunk-operator-5f74449b7c-vd99x_thingspace_146197bb-2021-4858-a5b8-2b4689b33494_0(1705c5031d6e8319ea820b68fa8cb441d88d53899e4dd8c63320377adba59095): error adding pod thingspace_splunk-operator-5f74449b7c-vd99x to CNI network "multus-cni-network": [thingspace/splunk-operator-5f74449b7c-vd99x:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[thingspace/splunk-operator-5f74449b7c-vd99x 1705c5031d6e8319ea820b68fa8cb441d88d53899e4dd8c63320377adba59095] [thingspace/splunk-operator-5f74449b7c-vd99x 1705c5031d6e8319ea820b68fa8cb441d88d53899e4dd8c63320377adba59095] failed to configure pod interface: error while waiting on flows for pod: timed out waiting for OVS flows ' Warning ErrorAddingLogicalPort 3m45s (x148 over 149m) controlplane unable to add external gwStr src-ip route to GR router, stderr:"ovn-nbctl: duplicate nexthop for the same ECMP route\n", err:&{%!!(MISSING)g(string=OVN command '/usr/bin/ovn-nbctl --timeout=15 --may-exist --policy=src-ip --ecmp-symmetric-reply lr-route-add GR_worker-148 192.168.9.236/32 198.19.16.25' failed: exit status 1)}w ovnkube-node rutes: Version-Release number of selected component (if applicable): OCP 4.7.24 sh-4.4# ovn-nbctl lr-route-list GR_worker-148 IPv4 Routes 192.168.9.176 198.19.16.1 src-ip ecmp-symmetric-reply 192.168.9.177 198.19.16.1 src-ip ecmp-symmetric-reply 192.168.9.181 198.19.16.1 src-ip ecmp-symmetric-reply 192.168.9.187 198.19.3.9 src-ip ecmp-symmetric-reply 192.168.9.192 198.19.16.1 src-ip ecmp-symmetric-reply 192.168.9.202 198.19.16.1 src-ip ecmp-symmetric-reply 192.168.9.203 198.19.16.1 src-ip ecmp-symmetric-reply 192.168.9.228 198.19.16.1 src-ip ecmp-symmetric-reply 192.168.9.236 198.19.16.25 src-ip ecmp-symmetric-reply 192.168.0.0/16 100.64.0.1 dst-ip 0.0.0.0/0 10.75.69.129 dst-ip rtoe-GR_worker-148 I1111 20:33:40.812500 1 ovn.go:584] [146197bb-2021-4858-a5b8-2b4689b33494/thingspace/splunk-operator-5f74449b7c-vd99x] retry pod setup I1111 20:33:40.812554 1 pods.go:338] LSP already exists for port: thingspace_splunk-operator-5f74449b7c-vd99x I1111 20:33:40.823426 1 pods.go:302] [thingspace/splunk-operator-5f74449b7c-vd99x] addLogicalPort took 10.890013ms I1111 20:33:40.823507 1 ovn.go:590] [146197bb-2021-4858-a5b8-2b4689b33494/thingspace/splunk-operator-5f74449b7c-vd99x] setup retry failed; will try again later I1111 20:33:40.823618 1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"thingspace", Name:"splunk-operator-5f74449b7c-vd99x", UID:"146197bb-2021-4858-a5b8-2b4689b33494", APIVersion:"v1", ResourceVersion:"815718841", FieldPath:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' unable to add external gwStr src-ip route to GR router, stderr:"ovn-nbctl: duplicate nexthop for the same ECMP route\n", err:&{%!!(MISSING)g(string=OVN command '/usr/bin/ovn-nbctl --timeout=15 --may-exist --policy=src-ip --ecmp-symmetric-reply lr-route-add GR_worker-148 192.168.9.236/32 198.19.16.25' failed: exit status 1)}w I1111 20:34:40.891795 1 ovn.go:584] [146197bb-2021-4858-a5b8-2b4689b33494/thingspace/splunk-operator-5f74449b7c-vd99x] retry pod setup I1111 20:34:40.891976 1 pods.go:338] LSP already exists for port: thingspace_splunk-operator-5f74449b7c-vd99x I1111 20:34:40.902447 1 pods.go:302] [thingspace/splunk-operator-5f74449b7c-vd99x] addLogicalPort took 10.634355ms I1111 20:34:40.902513 1 ovn.go:590] [146197bb-2021-4858-a5b8-2b4689b33494/thingspace/splunk-operator-5f74449b7c-vd99x] setup retry failed; will try again later I1111 20:34:40.902552 1 event.go:282] Event(v1.ObjectReference{Kind:"Pod", Namespace:"thingspace", Name:"splunk-operator-5f74449b7c-vd99x", UID:"146197bb-2021-4858-a5b8-2b4689b33494", APIVersion:"v1", ResourceVersion:"815718841", FieldPath:""}): type: 'Warning' reason: 'ErrorAddingLogicalPort' unable to add external gwStr src-ip route to GR router, stderr:"ovn-nbctl: duplicate nexthop for the same ECMP route\n", err:&{%!!(MISSING)g(string=OVN command '/usr/bin/ovn-nbctl --timeout=15 --may-exist --policy=src-ip --ecmp-symmetric-reply lr-route-add GR_worker-148 192.168.9.236/32 198.19.16.25' failed: exit status 1)}w How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
*** Bug 2027854 has been marked as a duplicate of this bug. ***
Hi, Any update on how one can verify this bz? Thanks, Alan
There are a few things to verify with these fixes: Scenario 1: 1. app pod is created in ns foo 2. exgwAPod is created in ns exgw1 (172.0.1.1), serving ns foo 3. exgwAPod is created in ns exgw2 (172.0.1.2), serving ns foo 4. Verify there is an ECMP route for both 170.0.1.1 and 172.0.1.2. 5. Delete the exgw pods, verify that both routes are removed. Scenario 2: 1. app pod is created in ns foo 2. exgwAPod is created in ns exgw1 (172.0.1.1), serving ns foo 3. exgwAPod is created in ns exgw2 (172.0.1.1), serving ns foo (duplicate IP in annotation) 4. verify there is only 1 one ECMP route exists 5. verify there is a log present "unable to add pod: exgw2/exgwAPod as external gateway for namespace: foo" 6. delete both exgwAPods, ensure there is no ECMP route present afterwards Scenario 3: 1. app pod is created in ns foo 2. annotate the ns foo namespace with the exgw annotation, but use a duplicate IPs: k8s.ovn.org/routing-external-gws: 172.18.0.4,172.18.0.5,172.18.0.4 3. verify that there is only 2 ECMP routes present (no duplicates)
or is there something you need from us?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056