Description of problem: There are a couple paths where deleting a pod acting as an external gateway will not clean up routes in the served namespaces: 1) If a pod is host networked, and acting as an external gateway pod. When it is deleted, the routes will not be removed for pods in served external gateway namespaces. 2) If a pod is acting as an external gateway pod, and it is removed while ovnkube-master is restarting, or the ovnkube-master cache has somehow become invalid, then routes will not be removed for pods in served external gateway namespaces.
3) More of a corner case, but... an exgw pod fails to be added, but its route gets programmed into OVN. Then the pod is updated with a change to its exgw annotations, the new annotation will be added as a route into OVN, but the old route will not be removed. 4) ovnkube-master is restarted, then the exgw pod is deleted. The stale ecmp routes are not removed. This is because of a bug where OVN complains about "duplicate nexthop" during cache recreation during ovnkube-master start. We had a workaround to handle this, but we were checking the wrong message. 5) ovnkube-master is restarted, and while it is down, the exgw pod is deleted. This is the hardest case to fix, because we get no event. We need to come up with a sync method for this. I've posted a PR that will fix cases 1-4: https://github.com/ovn-org/ovn-kubernetes/pull/2302
note 5 is really the same as number 2. I just found there was 2 issues causing it to occur and I fixed one of them.
*** Bug 1974430 has been marked as a duplicate of this bug. ***
Updated https://github.com/ovn-org/ovn-kubernetes/pull/2348 to handle the final case...under review
https://github.com/ovn-org/ovn-kubernetes/pull/2348 has got merged, we need to open a downstream merge PR to get this in.
Verified in 4.9.0-0.nightly-2021-08-31-123131 I0901 20:49:39.933792 1 egressgw.go:54] External gateway pod: testpod1, detected for namespace(s) exgw I0901 20:49:39.934041 1 egressgw.go:85] Adding routes for external gateway pod: testpod1, next hops: "fd2e:6f44:5dd8::8a,fd2e:6f44:5dd8::8f", namespace: exgw, bfd-enabled: false I0901 20:49:59.881049 1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Namespace total 8 items received I0901 20:50:06.108886 1 node_tracker.go:162] Processing possible switch / router updates for node master-0-2 I0901 20:50:06.115200 1 node_tracker.go:162] Processing possible switch / router updates for node master-0-0 I0901 20:50:10.829808 1 egressgw.go:54] External gateway pod: testpod1, detected for namespace(s) exgw I0901 20:50:10.829878 1 egressgw.go:85] Adding routes for external gateway pod: testpod1, next hops: "fd2e:6f44:5dd8::8a,fd2e:6f44:5dd8::8f", namespace: exgw, bfd-enabled: false I0901 20:50:10.839163 1 egressgw.go:54] External gateway pod: testpod1, detected for namespace(s) exgw I0901 20:50:10.839310 1 egressgw.go:85] Adding routes for external gateway pod: testpod1, next hops: "fd2e:6f44:5dd8::8a,fd2e:6f44:5dd8::8f", namespace: exgw, bfd-enabled: false I0901 20:50:10.849736 1 egressgw.go:176] Deleting routes for external gateway pod: testpod1, for namespace(s) exgw 2021-09-01T20:50:10.854Z|06283|unixctl|DBG|received request run["--if-exists","--policy=src-ip","--","lr-route-del","GR_master-0-1","fd01:0:0:1::2b/128","fd2e:6f44:5dd8::8a"], id=0 2021-09-01T20:50:10.855Z|06284|ovn_dbctl|INFO|Running command run --if-exists --policy=src-ip -- lr-route-del GR_master-0-1 fd01:0:0:1::2b/128 fd2e:6f44:5dd8::8a 2021-09-01T20:50:10.862Z|06285|unixctl|DBG|replying with success, id=0: "" 2021-09-01T20:50:10.865Z|06286|unixctl|DBG|received request run["--format=csv","--data=bare","--no-headings","--columns=bfd","--","find","Logical_Router_Static_Route","output_port=rtoe-GR_master-0-1","nexthop=\"fd2e:6f44:5dd8::8a\"","bfd!=[]"], id=0 2021-09-01T20:50:10.865Z|06287|ovn_dbctl|DBG|Running command run --format=csv --data=bare --no-headings --columns=bfd -- find Logical_Router_Static_Route output_port=rtoe-GR_master-0-1 "nexthop=\"fd2e:6f44:5dd8::8a\"" bfd!=[] 2021-09-01T20:50:10.865Z|06288|unixctl|DBG|replying with success, id=0: "" 2021-09-01T20:50:10.869Z|06289|unixctl|DBG|received request run["--format=csv","--data=bare","--no-headings","--columns=_uuid","--","find","BFD","logical_port=rtoe-GR_master-0-1","dst_ip=\"fd2e:6f44:5dd8::8a\""], id=0 2021-09-01T20:50:10.870Z|06290|ovn_dbctl|DBG|Running command run --format=csv --data=bare --no-headings --columns=_uuid -- find BFD logical_port=rtoe-GR_master-0-1 "dst_ip=\"fd2e:6f44:5dd8::8a\"" 2021-09-01T20:50:10.870Z|06291|unixctl|DBG|replying with success, id=0: "" I0901 20:50:10.870985 1 egressgw.go:548] Did not find bfd entry for rtoe-GR_master-0-1 fd2e:6f44:5dd8::8a 2021-09-01T20:50:10.875Z|06292|unixctl|DBG|received request run["--if-exists","--policy=src-ip","--","lr-route-del","GR_master-0-1","fd01:0:0:1::2b/128","fd2e:6f44:5dd8::8f"], id=0 2021-09-01T20:50:10.875Z|06293|ovn_dbctl|INFO|Running command run --if-exists --policy=src-ip -- lr-route-del GR_master-0-1 fd01:0:0:1::2b/128 fd2e:6f44:5dd8::8f 2021-09-01T20:50:10.879Z|06294|unixctl|DBG|replying with success, id=0: "" 2021-09-01T20:50:10.883Z|06295|unixctl|DBG|received request run["--format=csv","--data=bare","--no-headings","--columns=bfd","--","find","Logical_Router_Static_Route","output_port=rtoe-GR_master-0-1","nexthop=\"fd2e:6f44:5dd8::8f\"","bfd!=[]"], id=0 2021-09-01T20:50:10.883Z|06296|ovn_dbctl|DBG|Running command run --format=csv --data=bare --no-headings --columns=bfd -- find Logical_Router_Static_Route output_port=rtoe-GR_master-0-1 "nexthop=\"fd2e:6f44:5dd8::8f\"" bfd!=[] 2021-09-01T20:50:10.883Z|06297|unixctl|DBG|replying with success, id=0: "" 2021-09-01T20:50:10.887Z|06298|unixctl|DBG|received request run["--format=csv","--data=bare","--no-headings","--columns=_uuid","--","find","BFD","logical_port=rtoe-GR_master-0-1","dst_ip=\"fd2e:6f44:5dd8::8f\""], id=0 2021-09-01T20:50:10.887Z|06299|ovn_dbctl|DBG|Running command run --format=csv --data=bare --no-headings --columns=_uuid -- find BFD logical_port=rtoe-GR_master-0-1 "dst_ip=\"fd2e:6f44:5dd8::8f\"" 2021-09-01T20:50:10.887Z|06300|unixctl|DBG|replying with success, id=0: "" I0901 20:50:10.887943 1 egressgw.go:548] Did not find bfd entry for rtoe-GR_master-0-1 fd2e:6f44:5dd8::8f I0901 20:50:22.888957 1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1.Endpoints total 14 items received I0901 20:50:24.347027 1 reflector.go:530] k8s.io/client-go/informers/factory.go:134: Watch close - *v1beta1.EndpointSlice total 23 items received W0901 20:50:24.348964 1 warnings.go:70] discovery.k8s.io/v1beta1 EndpointSlice is deprecated in v1.21+, unavailable in v1.25+; use discovery.k8s.io/v1 EndpointSlice
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759