Bug 2062842
Summary: | OVN-Kubernetes - Stale Egress IP entries remain in NBDB after eip moves to new host, breaking arp. Persists after DB wipe. - 4.8.29 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Will Russell <wrussell> |
Component: | Networking | Assignee: | ffernand <ffernand> |
Networking sub component: | ovn-kubernetes | QA Contact: | jechen <jechen> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | high | CC: | akaris, evadla, ffernand, fhirtz, jechen, lserot, mifiedle, trozet, vpickard |
Version: | 4.8 | Keywords: | FastFix, Triaged |
Target Milestone: | --- | ||
Target Release: | 4.8.z | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-04-20 12:22:15 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 2059700 | ||
Bug Blocks: | 2056050 |
Description
Will Russell
2022-03-10 16:48:48 UTC
Verified with pre-merged image built with ovn-kubernetes#1009 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.ci.test-2022-03-24-123436-ci-ln-966pd2b-latest True False 9m2s Cluster version is 4.8.0-0.ci.test-2022-03-24-123436-ci-ln-966pd2b-latest $ oc get node NAME STATUS ROLES AGE VERSION compute-0 Ready worker 24m v1.21.8+ee73ea2 compute-1 Ready worker 28m v1.21.8+ee73ea2 control-plane-0 Ready master 37m v1.21.8+ee73ea2 control-plane-1 Ready master 37m v1.21.8+ee73ea2 control-plane-2 Ready master 37m v1.21.8+ee73ea2$ oc get node -owide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME compute-0 Ready worker 51m v1.21.8+ee73ea2 172.31.248.48 172.31.248.48 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8 compute-1 Ready worker 55m v1.21.8+ee73ea2 172.31.248.51 172.31.248.51 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8 control-plane-0 Ready master 65m v1.21.8+ee73ea2 172.31.248.40 172.31.248.40 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8 control-plane-1 Ready master 65m v1.21.8+ee73ea2 172.31.248.50 172.31.248.50 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8 control-plane-2 Ready master 64m v1.21.8+ee73ea2 172.31.248.49 172.31.248.49 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8 $ oc label node compute-0 "k8s.ovn.org/egress-assignable"="" node/compute-0 labeled $ oc label node compute-1 "k8s.ovn.org/egress-assignable"="" node/compute-1 labeled $ cat config_egressip1_ovn_ns_team_red.yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip1 spec: egressIPs: - 172.31.248.101 - 172.31.248.102 - 172.31.248.103 namespaceSelector: matchLabels: team: red $ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created $ oc get egressip -oyaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2022-03-24T14:36:08Z" generation: 2 name: egressip1 resourceVersion: "43763" uid: 41aa9fb4-0381-4fbe-99fb-1275540148ed spec: egressIPs: - 172.31.248.101 - 172.31.248.102 - 172.31.248.103 namespaceSelector: matchLabels: team: red podSelector: {} status: items: - egressIP: 172.31.248.101 node: compute-1 - egressIP: 172.31.248.102 node: compute-0 kind: List metadata: resourceVersion: "" selfLink: "" $ oc new-project test $ oc label ns test team=red $ oc create -f ./SDN-1332-test/list_for_pods.json $ oc get pod -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES test-rc-4jpbh 1/1 Running 0 97s 10.131.0.27 compute-1 <none> <none> test-rc-7rhcm 1/1 Running 0 97s 10.128.2.35 compute-0 <none> <none> test-rc-8cc2n 1/1 Running 0 97s 10.131.0.26 compute-1 <none> <none> test-rc-9cqds 1/1 Running 0 97s 10.128.2.34 compute-0 <none> <none> test-rc-m2fwv 1/1 Running 0 97s 10.128.2.36 compute-0 <none> <none> test-rc-nllv8 1/1 Running 0 97s 10.131.0.28 compute-1 <none> <none> test-rc-pcrpg 1/1 Running 0 97s 10.131.0.24 compute-1 <none> <none> test-rc-psfpw 1/1 Running 0 97s 10.128.2.32 compute-0 <none> <none> test-rc-qk4zl 1/1 Running 0 97s 10.128.2.33 compute-0 <none> <none> test-rc-sltzs 1/1 Running 0 97s 10.131.0.25 compute-1 <none> <none> $ oc rsh test-rc-4jpbh ~ $ while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done; 172.31.248.101 172.31.248.102 172.31.248.101 172.31.248.101 172.31.248.101 172.31.248.102^C ~ $ exit command terminated with exit code 130 $ oc rsh test-rc-7rhcm ~ $ while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done; 172.31.248.101 172.31.248.101 172.31.248.102 172.31.248.101 172.31.248.101 172.31.248.102 172.31.248.101^C ~ $ exit command terminated with exit code 130 $ oc get pod -n openshift-ovn-kubernetes NAME READY STATUS RESTARTS AGE ovnkube-master-2bgj9 6/6 Running 6 75m ovnkube-master-qqpt8 6/6 Running 6 75m ovnkube-master-s5s5s 6/6 Running 0 75m ovnkube-node-8jcnc 4/4 Running 0 75m ovnkube-node-hgjrd 4/4 Running 0 75m ovnkube-node-nfjqn 4/4 Running 0 75m ovnkube-node-qzj8g 4/4 Running 0 66m ovnkube-node-tks8s 4/4 Running 0 62m $ oc get -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}' -n openshift-ovn-kubernetes cm ovn-kubernetes-master {"holderIdentity":"control-plane-0","leaseDurationSeconds":60,"acquireTime":"2022-03-24T13:31:41Z","renewTime":"2022-03-24T14:41:38Z","leaderTransitions":0} $ oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-master --field-selector=spec.nodeName=control-plane-0 -o jsonpath={.items[*].metadata.name} ovnkube-master-s5s5s $ oc -n openshift-ovn-kubernetes rsh ovnkube-master-s5s5s Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router | grep "100 " 100 ip4.src == 10.128.2.32 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.128.2.33 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.128.2.34 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.128.2.35 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.128.2.36 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.24 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.25 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.26 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.27 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.28 reroute 100.64.0.5, 100.64.0.6 sh-4.4# $ oc debug node/jechen-0323d-4b5h8-compute-0 Starting pod/jechen-0323d-4b5h8-compute-0-debug ... To use host binaries, run `chroot /host` Pod IP: 172.31.248.18 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# reboot Removing debug pod ... $ oc -n openshift-ovn-kubernetes rsh ovnkube-master-s5s5s Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router | grep "100 " 100 ip4.src == 10.128.2.32 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.128.2.33 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.128.2.34 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.128.2.35 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.128.2.36 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.24 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.25 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.26 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.27 reroute 100.64.0.5, 100.64.0.6 100 ip4.src == 10.131.0.28 reroute 100.64.0.5, 100.64.0.6 verified that no missing internal IP or duplicate record found sh-4.4# ovn-nbctl --format=csv find nat external_ids:name=egressip1 | egrep -v "compute-1|compute-0" _uuid,allowed_ext_ips,exempted_ext_ips,external_ids,external_ip,external_mac,external_port_range,logical_ip,logical_port,options,type no stale nat entry found Verified: Tested *** Bug 2059706 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.37 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1369 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |