1998146 – service VIP did not be removed after remove one node

Bug 1998146 - service VIP did not be removed after remove one node

Summary: service VIP did not be removed after remove one node

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	high
Target Milestone:	---
Target Release:	4.9.0
Assignee:	Surya Seetharaman
QA Contact:	zhaozhanqi
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-08-26 13:27 UTC by zhaozhanqi
Modified:	2021-10-18 17:49 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-10-18 17:49:07 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 698	0	None	None	None	2021-09-01 12:44:08 UTC
Red Hat Product Errata	RHSA-2021:3759	0	None	None	None	2021-10-18 17:49:25 UTC

Description zhaozhanqi 2021-08-26 13:27:58 UTC

Description of problem:
openshift-dns pods will be scheduled on each node. However after removing one worker. the old dns pod IP did not be removed from load_balancer vip

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-25-185404

How reproducible:
always

Steps to Reproduce:
1. setup one cluster 
2. Delete one worker
3. Check the openshift-dns pod

oc get pod -n openshift-dns
dns-default-2v8gd     2/2     Running   0          7h43m   10.129.0.13      juzhao-share1-2sqt7-master-1             <none>           <none>
dns-default-d8nhr     2/2     Running   0          128m    10.129.4.5       juzhao-share1-2sqt7-worker-nffg2         <none>           <none>
dns-default-fjr4q     2/2     Running   0          6h29m   10.130.2.5       juzhao-share1-2sqt7-rhel-1               <none>           <none>
dns-default-fp6x8     2/2     Running   0          7h43m   10.130.0.36      juzhao-share1-2sqt7-master-2             <none>           <none>
dns-default-gjpj7     2/2     Running   0          7h43m   10.128.0.13      juzhao-share1-2sqt7-master-0             <none>           <none>
dns-default-klwbs     2/2     Running   0          5h18m   10.128.4.6       juzhao-share1-2sqt7-juzhao-share1-79-0   <none>           <none>
dns-default-nhpzm     2/2     Running   0          6h29m   10.129.2.5       juzhao-share1-2sqt7-rhel-0               <none>           <none>
dns-default-smrct     2/2     Running   0          5h18m   10.131.2.5       juzhao-share1-2sqt7-juzhao-share1-79-1   <none>     



sh-4.4# ovn-nbctl list load_balancer | grep 172.30.0.10 -B 6
health_check        : []
ip_port_mappings    : {}
name                : "Service_openshift-dns/dns-default_UDP_cluster"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : udp
selection_fields    : []
vips                : {"172.30.0.10:53"="10.128.0.13:5353,10.128.2.5:5353,10.129.0.13:5353,10.129.2.5:5353,10.130.0.36:5353,10.130.2.5:5353,10.131.0.12:5353"}
--
health_check        : []
ip_port_mappings    : {}
name                : "Service_openshift-dns/dns-default_TCP_cluster"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : tcp
selection_fields    : []
vips                : {"172.30.0.10:53"="10.128.0.13:5353,10.128.2.5:5353,10.129.0.13:5353,10.129.2.5:5353,10.130.0.36:5353,10.130.2.5:5353,10.131.0.12:5353", "172.30.0.10:9154"="10.128.0.13:9154,10.128.2.5:9154,10.129.0.13:9154,10.129.2.5:9154,10.130.0.36:9154,10.130.2.5:9154,10.131.0.12:9154"}


Actual results:

From above output, the dns pods ip and vip of loadbalance are inconsistent. 

Expected results:


Additional info:

Comment 1 zhaozhanqi 2021-08-26 13:32:09 UTC

FYI...10.128.2.5:5353 and 10.131.0.12:5353 should be removed.

Comment 3 Surya Seetharaman 2021-08-27 08:56:00 UTC

I reproduced this. This bug will cause cluster disruptions if a node hosting critical infra pods gets deleted. The endpoints won't be reachable, I am guessing during upgrades we'll reboot nodes and shift load like dns pods into other nodes - auth will fail if core-dns pods are spawned on different nodes and endpoints don't get changed in ovn-k. In general having stale endpoints is a critical issue.

I am marking this as a blocker for now. Working on a fix.

Steps to reproduce:



[surya@hidden-temple contrib]$ oc get nodes
NAME                STATUS   ROLES                  AGE   VERSION
ovn-control-plane   Ready    control-plane,master   15m   v1.20.0
ovn-worker          Ready    <none>                 14m   v1.20.0
ovn-worker2         Ready    <none>                 14m   v1.20.0



[surya@hidden-temple contrib]$ oc get pods -A -owide
NAMESPACE            NAME                                        READY   STATUS    RESTARTS   AGE   IP           NODE                NOMINATED NODE   READINESS GATES
kube-system          coredns-74ff55c5b-h95vc                     1/1     Running   0          15m   10.244.2.3   ovn-worker2         <none>           <none>
kube-system          coredns-74ff55c5b-kh8w8                     1/1     Running   0          15m   10.244.1.3   ovn-worker          <none>           <none>


sh-5.0# ovn-nbctl list load_balancer
_uuid               : 3b912b21-f0c4-4afb-8906-743f13727bbc
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="kube-system/kube-dns"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_kube-system/kube-dns_TCP_cluster"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : tcp
selection_fields    : []
vips                : {"10.96.0.10:53"="10.244.1.3:53,10.244.2.3:53", "10.96.0.10:9153"="10.244.1.3:9153,10.244.2.3:9153"}

_uuid               : 0d2bc5d2-3687-439a-9bd1-30426dfb43c2
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="kube-system/kube-dns"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_kube-system/kube-dns_UDP_cluster"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : udp
selection_fields    : []
vips                : {"10.96.0.10:53"="10.244.1.3:53,10.244.2.3:53"}

[surya@hidden-temple contrib]$ oc delete node ovn-worker
node "ovn-worker" deleted

I0827 08:51:59.868897      50 ovn.go:1068] Delete event for Node "ovn-worker". Removing the node from various caches
I0827 08:51:59.868978      50 master.go:1063] Deleted HostSubnet 10.244.1.0/24 for node ovn-worker
I0827 08:51:59.869092      50 ovs.go:209] exec(146): /usr/bin/ovn-nbctl --timeout=15 --if-exist ls-del ovn-worker
I0827 08:51:59.873981      50 ovs.go:212] exec(146): stdout: ""
I0827 08:51:59.874022      50 ovs.go:213] exec(146): stderr: ""
I0827 08:51:59.874095      50 ovs.go:209] exec(147): /usr/bin/ovn-nbctl --timeout=15 --if-exist lrp-del rtos-ovn-worker
I0827 08:51:59.880002      50 ovs.go:212] exec(147): stdout: ""
I0827 08:51:59.880088      50 ovs.go:213] exec(147): stderr: ""


We keep seeing:


I0827 08:52:51.185522      50 ovs.go:209] exec(172): /usr/bin/ovn-nbctl --timeout=15 set load_balancer 3b912b21-f0c4-4afb-8906-743f13727bbc external_ids:k8s.ovn.org/kind=Service external_ids:k8s.ovn.org/owner=kube-system/kube-dns name=Service_kube-system/kube-dns_TCP_cluster options:event=false options:reject=true options:skip_snat=false protocol=tcp selection_fields=[] vips={"10.96.0.10:53"="10.244.2.3:53,10.244.2.5:53","10.96.0.10:9153"="10.244.2.3:9153,10.244.2.5:9153"} -- --if-exists ls-lb-del ovn-worker 3b912b21-f0c4-4afb-8906-743f13727bbc -- --if-exists lr-lb-del GR_ovn-worker 3b912b21-f0c4-4afb-8906-743f13727bbc -- set load_balancer 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 external_ids:k8s.ovn.org/kind=Service external_ids:k8s.ovn.org/owner=kube-system/kube-dns name=Service_kube-system/kube-dns_UDP_cluster options:event=false options:reject=true options:skip_snat=false protocol=udp selection_fields=[] vips={"10.96.0.10:53"="10.244.2.3:53,10.244.2.5:53"} -- --if-exists ls-lb-del ovn-worker 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 -- --if-exists lr-lb-del GR_ovn-worker 0d2bc5d2-3687-439a-9bd1-30426dfb43c2
I0827 08:52:51.189255      50 ovs.go:212] exec(172): stdout: ""
I0827 08:52:51.189287      50 ovs.go:213] exec(172): stderr: "ovn-nbctl: ovn-worker: switch name not found\n"
I0827 08:52:51.189303      50 ovs.go:215] exec(172): err: exit status 1
I0827 08:52:51.189351      50 services_controller.go:244] Finished syncing service kube-dns on namespace kube-system : 4.434042ms
I0827 08:52:51.189425      50 services_controller.go:218] "Error syncing service, retrying" service="kube-system/kube-dns" err="failed to ensure service kube-system/kube-dns load balancers: failed to commit load balancer changes for map[string]string{\"k8s.ovn.org/kind\":\"Service\", \"k8s.ovn.org/owner\":\"kube-system/kube-dns\"}: OVN command '/usr/bin/ovn-nbctl --timeout=15 set load_balancer 3b912b21-f0c4-4afb-8906-743f13727bbc external_ids:k8s.ovn.org/kind=Service external_ids:k8s.ovn.org/owner=kube-system/kube-dns name=Service_kube-system/kube-dns_TCP_cluster options:event=false options:reject=true options:skip_snat=false protocol=tcp selection_fields=[] vips={\"10.96.0.10:53\"=\"10.244.2.3:53,10.244.2.5:53\",\"10.96.0.10:9153\"=\"10.244.2.3:9153,10.244.2.5:9153\"} -- --if-exists ls-lb-del ovn-worker 3b912b21-f0c4-4afb-8906-743f13727bbc -- --if-exists lr-lb-del GR_ovn-worker 3b912b21-f0c4-4afb-8906-743f13727bbc -- set load_balancer 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 external_ids:k8s.ovn.org/kind=Service external_ids:k8s.ovn.org/owner=kube-system/kube-dns name=Service_kube-system/kube-dns_UDP_cluster options:event=false options:reject=true options:skip_snat=false protocol=udp selection_fields=[] vips={\"10.96.0.10:53\"=\"10.244.2.3:53,10.244.2.5:53\"} -- --if-exists ls-lb-del ovn-worker 0d2bc5d2-3687-439a-9bd1-30426dfb43c2 -- --if-exists lr-lb-del GR_ovn-worker 0d2bc5d2-3687-439a-9bd1-30426dfb43c2' failed: exit status 1"


in the logs.


Final state:


[surya@hidden-temple contrib]$  oc get pods -A -owide
NAMESPACE            NAME                                        READY   STATUS    RESTARTS   AGE    IP           NODE                NOMINATED NODE   READINESS GATES
kube-system          coredns-74ff55c5b-55ql5                     1/1     Running   0          108s   10.244.2.5   ovn-worker2         <none>           <none>
kube-system          coredns-74ff55c5b-h95vc                     1/1     Running   0          19m    10.244.2.3   ovn-worker2         <none>           <none>


and....

_uuid               : 3b912b21-f0c4-4afb-8906-743f13727bbc
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="kube-system/kube-dns"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_kube-system/kube-dns_TCP_cluster"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : tcp
selection_fields    : []
vips                : {"10.96.0.10:53"="10.244.1.3:53,10.244.2.3:53", "10.96.0.10:9153"="10.244.1.3:9153,10.244.2.3:9153"}

_uuid               : 0d2bc5d2-3687-439a-9bd1-30426dfb43c2
external_ids        : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="kube-system/kube-dns"}
health_check        : []
ip_port_mappings    : {}
name                : "Service_kube-system/kube-dns_UDP_cluster"
options             : {event="false", reject="true", skip_snat="false"}
protocol            : udp
selection_fields    : []
vips                : {"10.96.0.10:53"="10.244.1.3:53,10.244.2.3:53"}


as we can see the endpoints didn't get updated to reflect the new dns pods, still stays the same.

Comment 4 Surya Seetharaman 2021-08-27 11:24:32 UTC

So here is the main problem. When we delete the node, we remove the gateway router and switch first, then proceed to delete pods and services correspondingly that were on that node and in case of replicas, we recreate them on other nodes thereby needing a load_balancer update.

The way I see it we have two issues here:

A) Stale loadbalancers: When we delete the switch/routers we cannot call the destroy command because we don't know if that load balancer is shared for any other switches/routers/cluster in general. Example a clusterIP loadbalancer:

_uuid : 9f58dc58-ab99-4356-abed-de3e2f422376
external_ids : {"k8s.ovn.org/kind"=Service, "k8s.ovn.org/owner"="default/service-backed-server-on-ovn-worker2"}
health_check : []
ip_port_mappings : {}
name : "Service_default/service-backed-server-on-ovn-worker2_TCP_cluster"
options : {event="false", reject="true", skip_snat="false"}
protocol : tcp
selection_fields : []
vips : {"10.96.155.125:80"="10.244.2.6:80"}

So we call ls-lb-del and lr-lb-del commands when we encounter a node deletion, but since the switch is gone, OVN will complain and won't remove the load balancers. We end up with stale load balancers (node specific)+stale association of cluster level load balancers to the switches and routers.

B) Update load balancer transactions are batched for performance reasons. We send a "set load_balancer" call - to update the along with the "ls-lb-del" and "lr-lb-del" commands. The whole bunch fails even if a single command fails. IN this case all delete load balancer calls will fail with the "switch not found" or "router not found error".

Comment 5 Surya Seetharaman 2021-08-27 20:33:07 UTC

Solution is to check if the switch or router exists, and then try the ls-lb-del, otherwise for a case where the switch is already gone, there is no point in trying to delete the reference. Working on PR, will post it shortly.

Comment 6 Surya Seetharaman 2021-08-29 23:06:17 UTC

Alternative; and more accurate solution is to remove the switches and routers from all LB's in the LBCache.

Upstream PR posted: https://github.com/ovn-org/ovn-kubernetes/pull/2457

Comment 9 zhaozhanqi 2021-09-03 07:15:30 UTC

Verified this bug on 4.9.0-0.nightly-2021-09-01-193941

vips can deleted after delete node and re-added when node rollback.

Comment 12 errata-xmlrpc 2021-10-18 17:49:07 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Note You need to log in before you can comment on or make changes to this bug.