Description of problem: openshift-dns pods will be scheduled on each node. However after removing one worker. the old dns pod IP did not be removed from load_balancer vip Version-Release number of selected component (if applicable): 4.9.0-0.nightly-2021-08-25-185404 How reproducible: always Steps to Reproduce: 1. setup one cluster 2. Delete one worker 3. Check the openshift-dns pod oc get pod -n openshift-dns dns-default-2v8gd 2/2 Running 0 7h43m 10.129.0.13 juzhao-share1-2sqt7-master-1 <none> <none> dns-default-d8nhr 2/2 Running 0 128m 10.129.4.5 juzhao-share1-2sqt7-worker-nffg2 <none> <none> dns-default-fjr4q 2/2 Running 0 6h29m 10.130.2.5 juzhao-share1-2sqt7-rhel-1 <none> <none> dns-default-fp6x8 2/2 Running 0 7h43m 10.130.0.36 juzhao-share1-2sqt7-master-2 <none> <none> dns-default-gjpj7 2/2 Running 0 7h43m 10.128.0.13 juzhao-share1-2sqt7-master-0 <none> <none> dns-default-klwbs 2/2 Running 0 5h18m 10.128.4.6 juzhao-share1-2sqt7-juzhao-share1-79-0 <none> <none> dns-default-nhpzm 2/2 Running 0 6h29m 10.129.2.5 juzhao-share1-2sqt7-rhel-0 <none> <none> dns-default-smrct 2/2 Running 0 5h18m 10.131.2.5 juzhao-share1-2sqt7-juzhao-share1-79-1 <none> sh-4.4# ovn-nbctl list load_balancer | grep 172.30.0.10 -B 6 health_check : [] ip_port_mappings : {} name : "Service_openshift-dns/dns-default_UDP_cluster" options : {event="false", reject="true", skip_snat="false"} protocol : udp selection_fields : [] vips : {"172.30.0.10:53"="10.128.0.13:5353,10.128.2.5:5353,10.129.0.13:5353,10.129.2.5:5353,10.130.0.36:5353,10.130.2.5:5353,10.131.0.12:5353"} -- health_check : [] ip_port_mappings : {} name : "Service_openshift-dns/dns-default_TCP_cluster" options : {event="false", reject="true", skip_snat="false"} protocol : tcp selection_fields : [] vips : {"172.30.0.10:53"="10.128.0.13:5353,10.128.2.5:5353,10.129.0.13:5353,10.129.2.5:5353,10.130.0.36:5353,10.130.2.5:5353,10.131.0.12:5353", "172.30.0.10:9154"="10.128.0.13:9154,10.128.2.5:9154,10.129.0.13:9154,10.129.2.5:9154,10.130.0.36:9154,10.130.2.5:9154,10.131.0.12:9154"} Actual results: From above output, the dns pods ip and vip of loadbalance are inconsistent. Expected results: Additional info:
FYI...10.128.2.5:5353 and 10.131.0.12:5353 should be removed.
I reproduced this. This bug will cause cluster disruptions if a node hosting critical infra pods gets deleted. The endpoints won't be reachable, I am guessing during upgrades we'll reboot nodes and shift load like dns pods into other nodes - auth will fail if core-dns pods are spawned on different nodes and endpoints don't get changed in ovn-k. In general having stale endpoints is a critical issue. I am marking this as a blocker for now. Working on a fix. Steps to reproduce: [surya@hidden-temple contrib]$ oc get nodes NAME STATUS ROLES AGE VERSION ovn-control-plane Ready control-plane,master 15m v1.20.0 ovn-worker Ready <none> 14m v1.20.0 ovn-worker2 Ready <none> 14m v1.20.0 [surya@hidden-temple contrib]$ oc get pods -A -owide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-74ff55c5b-h95vc 1/1 Running 0 15m 10.244.2.3 ovn-worker2 <none> <none> kube-system coredns-74ff55c5b-kh8w8 1/1 Running 0 15m 10.244.1.3 ovn-worker <none> <none> sh-5.0# ovn-nbctl list load_balancer _uuid : 3b912b21-f0c4-4afb-8906-743f13727bbc external_ids : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="kube-system/kube-dns"} health_check : [] ip_port_mappings : {} name : "Service_kube-system/kube-dns_TCP_cluster" options : {event="false", reject="true", skip_snat="false"} protocol : tcp selection_fields : [] vips : {"10.96.0.10:53"="10.244.1.3:53,10.244.2.3:53", "10.96.0.10:9153"="10.244.1.3:9153,10.244.2.3:9153"} _uuid : 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 external_ids : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="kube-system/kube-dns"} health_check : [] ip_port_mappings : {} name : "Service_kube-system/kube-dns_UDP_cluster" options : {event="false", reject="true", skip_snat="false"} protocol : udp selection_fields : [] vips : {"10.96.0.10:53"="10.244.1.3:53,10.244.2.3:53"} [surya@hidden-temple contrib]$ oc delete node ovn-worker node "ovn-worker" deleted I0827 08:51:59.868897 50 ovn.go:1068] Delete event for Node "ovn-worker". Removing the node from various caches I0827 08:51:59.868978 50 master.go:1063] Deleted HostSubnet 10.244.1.0/24 for node ovn-worker I0827 08:51:59.869092 50 ovs.go:209] exec(146): /usr/bin/ovn-nbctl --timeout=15 --if-exist ls-del ovn-worker I0827 08:51:59.873981 50 ovs.go:212] exec(146): stdout: "" I0827 08:51:59.874022 50 ovs.go:213] exec(146): stderr: "" I0827 08:51:59.874095 50 ovs.go:209] exec(147): /usr/bin/ovn-nbctl --timeout=15 --if-exist lrp-del rtos-ovn-worker I0827 08:51:59.880002 50 ovs.go:212] exec(147): stdout: "" I0827 08:51:59.880088 50 ovs.go:213] exec(147): stderr: "" We keep seeing: I0827 08:52:51.185522 50 ovs.go:209] exec(172): /usr/bin/ovn-nbctl --timeout=15 set load_balancer 3b912b21-f0c4-4afb-8906-743f13727bbc external_ids:k8s.ovn.org/kind=Service external_ids:k8s.ovn.org/owner=kube-system/kube-dns name=Service_kube-system/kube-dns_TCP_cluster options:event=false options:reject=true options:skip_snat=false protocol=tcp selection_fields=[] vips={"10.96.0.10:53"="10.244.2.3:53,10.244.2.5:53","10.96.0.10:9153"="10.244.2.3:9153,10.244.2.5:9153"} -- --if-exists ls-lb-del ovn-worker 3b912b21-f0c4-4afb-8906-743f13727bbc -- --if-exists lr-lb-del GR_ovn-worker 3b912b21-f0c4-4afb-8906-743f13727bbc -- set load_balancer 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 external_ids:k8s.ovn.org/kind=Service external_ids:k8s.ovn.org/owner=kube-system/kube-dns name=Service_kube-system/kube-dns_UDP_cluster options:event=false options:reject=true options:skip_snat=false protocol=udp selection_fields=[] vips={"10.96.0.10:53"="10.244.2.3:53,10.244.2.5:53"} -- --if-exists ls-lb-del ovn-worker 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 -- --if-exists lr-lb-del GR_ovn-worker 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 I0827 08:52:51.189255 50 ovs.go:212] exec(172): stdout: "" I0827 08:52:51.189287 50 ovs.go:213] exec(172): stderr: "ovn-nbctl: ovn-worker: switch name not found\n" I0827 08:52:51.189303 50 ovs.go:215] exec(172): err: exit status 1 I0827 08:52:51.189351 50 services_controller.go:244] Finished syncing service kube-dns on namespace kube-system : 4.434042ms I0827 08:52:51.189425 50 services_controller.go:218] "Error syncing service, retrying" service="kube-system/kube-dns" err="failed to ensure service kube-system/kube-dns load balancers: failed to commit load balancer changes for map[string]string{\"k8s.ovn.org/kind\":\"Service\", \"k8s.ovn.org/owner\":\"kube-system/kube-dns\"}: OVN command '/usr/bin/ovn-nbctl --timeout=15 set load_balancer 3b912b21-f0c4-4afb-8906-743f13727bbc external_ids:k8s.ovn.org/kind=Service external_ids:k8s.ovn.org/owner=kube-system/kube-dns name=Service_kube-system/kube-dns_TCP_cluster options:event=false options:reject=true options:skip_snat=false protocol=tcp selection_fields=[] vips={\"10.96.0.10:53\"=\"10.244.2.3:53,10.244.2.5:53\",\"10.96.0.10:9153\"=\"10.244.2.3:9153,10.244.2.5:9153\"} -- --if-exists ls-lb-del ovn-worker 3b912b21-f0c4-4afb-8906-743f13727bbc -- --if-exists lr-lb-del GR_ovn-worker 3b912b21-f0c4-4afb-8906-743f13727bbc -- set load_balancer 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 external_ids:k8s.ovn.org/kind=Service external_ids:k8s.ovn.org/owner=kube-system/kube-dns name=Service_kube-system/kube-dns_UDP_cluster options:event=false options:reject=true options:skip_snat=false protocol=udp selection_fields=[] vips={\"10.96.0.10:53\"=\"10.244.2.3:53,10.244.2.5:53\"} -- --if-exists ls-lb-del ovn-worker 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 -- --if-exists lr-lb-del GR_ovn-worker 0d2bc5d2-3687-439a-9bd1-30426dfb43c2' failed: exit status 1" in the logs. Final state: [surya@hidden-temple contrib]$ oc get pods -A -owide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-74ff55c5b-55ql5 1/1 Running 0 108s 10.244.2.5 ovn-worker2 <none> <none> kube-system coredns-74ff55c5b-h95vc 1/1 Running 0 19m 10.244.2.3 ovn-worker2 <none> <none> and.... _uuid : 3b912b21-f0c4-4afb-8906-743f13727bbc external_ids : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="kube-system/kube-dns"} health_check : [] ip_port_mappings : {} name : "Service_kube-system/kube-dns_TCP_cluster" options : {event="false", reject="true", skip_snat="false"} protocol : tcp selection_fields : [] vips : {"10.96.0.10:53"="10.244.1.3:53,10.244.2.3:53", "10.96.0.10:9153"="10.244.1.3:9153,10.244.2.3:9153"} _uuid : 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 external_ids : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="kube-system/kube-dns"} health_check : [] ip_port_mappings : {} name : "Service_kube-system/kube-dns_UDP_cluster" options : {event="false", reject="true", skip_snat="false"} protocol : udp selection_fields : [] vips : {"10.96.0.10:53"="10.244.1.3:53,10.244.2.3:53"} as we can see the endpoints didn't get updated to reflect the new dns pods, still stays the same.
So here is the main problem. When we delete the node, we remove the gateway router and switch first, then proceed to delete pods and services correspondingly that were on that node and in case of replicas, we recreate them on other nodes thereby needing a load_balancer update. The way I see it we have two issues here: A) Stale loadbalancers: When we delete the switch/routers we cannot call the destroy command because we don't know if that load balancer is shared for any other switches/routers/cluster in general. Example a clusterIP loadbalancer: _uuid : 9f58dc58-ab99-4356-abed-de3e2f422376 external_ids : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="default/service-backed-server-on-ovn-worker2"} health_check : [] ip_port_mappings : {} name : "Service_default/service-backed-server-on-ovn-worker2_TCP_cluster" options : {event="false", reject="true", skip_snat="false"} protocol : tcp selection_fields : [] vips : {"10.96.155.125:80"="10.244.2.6:80"} So we call ls-lb-del and lr-lb-del commands when we encounter a node deletion, but since the switch is gone, OVN will complain and won't remove the load balancers. We end up with stale load balancers (node specific)+stale association of cluster level load balancers to the switches and routers. B) Update load balancer transactions are batched for performance reasons. We send a "set load_balancer" call - to update the along with the "ls-lb-del" and "lr-lb-del" commands. The whole bunch fails even if a single command fails. IN this case all delete load balancer calls will fail with the "switch not found" or "router not found error".
Solution is to check if the switch or router exists, and then try the ls-lb-del, otherwise for a case where the switch is already gone, there is no point in trying to delete the reference. Working on PR, will post it shortly.
Alternative; and more accurate solution is to remove the switches and routers from all LB's in the LBCache. Upstream PR posted: https://github.com/ovn-org/ovn-kubernetes/pull/2457
Verified this bug on 4.9.0-0.nightly-2021-09-01-193941 vips can deleted after delete node and re-added when node rollback.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759