Bug 2062842
| Summary: | OVN-Kubernetes - Stale Egress IP entries remain in NBDB after eip moves to new host, breaking arp. Persists after DB wipe. - 4.8.29 | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Will Russell <wrussell> |
| Component: | Networking | Assignee: | ffernand <ffernand> |
| Networking sub component: | ovn-kubernetes | QA Contact: | jechen <jechen> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | high | CC: | akaris, evadla, ffernand, fhirtz, jechen, lserot, mifiedle, trozet, vpickard |
| Version: | 4.8 | Keywords: | FastFix, Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.8.z | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-04-20 12:22:15 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 2059700 | ||
| Bug Blocks: | 2056050 | ||
|
Description
Will Russell
2022-03-10 16:48:48 UTC
Verified with pre-merged image built with ovn-kubernetes#1009
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.0-0.ci.test-2022-03-24-123436-ci-ln-966pd2b-latest True False 9m2s Cluster version is 4.8.0-0.ci.test-2022-03-24-123436-ci-ln-966pd2b-latest
$ oc get node
NAME STATUS ROLES AGE VERSION
compute-0 Ready worker 24m v1.21.8+ee73ea2
compute-1 Ready worker 28m v1.21.8+ee73ea2
control-plane-0 Ready master 37m v1.21.8+ee73ea2
control-plane-1 Ready master 37m v1.21.8+ee73ea2
control-plane-2 Ready master 37m v1.21.8+ee73ea2$ oc get node -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
compute-0 Ready worker 51m v1.21.8+ee73ea2 172.31.248.48 172.31.248.48 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8
compute-1 Ready worker 55m v1.21.8+ee73ea2 172.31.248.51 172.31.248.51 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8
control-plane-0 Ready master 65m v1.21.8+ee73ea2 172.31.248.40 172.31.248.40 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8
control-plane-1 Ready master 65m v1.21.8+ee73ea2 172.31.248.50 172.31.248.50 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8
control-plane-2 Ready master 64m v1.21.8+ee73ea2 172.31.248.49 172.31.248.49 Red Hat Enterprise Linux CoreOS 48.84.202203221810-0 (Ootpa) 4.18.0-305.40.2.el8_4.x86_64 cri-o://1.21.6-2.rhaos4.8.gitb948fcd.el8
$ oc label node compute-0 "k8s.ovn.org/egress-assignable"=""
node/compute-0 labeled
$ oc label node compute-1 "k8s.ovn.org/egress-assignable"=""
node/compute-1 labeled
$ cat config_egressip1_ovn_ns_team_red.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
name: egressip1
spec:
egressIPs:
- 172.31.248.101
- 172.31.248.102
- 172.31.248.103
namespaceSelector:
matchLabels:
team: red
$ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created
$ oc get egressip -oyaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
creationTimestamp: "2022-03-24T14:36:08Z"
generation: 2
name: egressip1
resourceVersion: "43763"
uid: 41aa9fb4-0381-4fbe-99fb-1275540148ed
spec:
egressIPs:
- 172.31.248.101
- 172.31.248.102
- 172.31.248.103
namespaceSelector:
matchLabels:
team: red
podSelector: {}
status:
items:
- egressIP: 172.31.248.101
node: compute-1
- egressIP: 172.31.248.102
node: compute-0
kind: List
metadata:
resourceVersion: ""
selfLink: ""
$ oc new-project test
$ oc label ns test team=red
$ oc create -f ./SDN-1332-test/list_for_pods.json
$ oc get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-rc-4jpbh 1/1 Running 0 97s 10.131.0.27 compute-1 <none> <none>
test-rc-7rhcm 1/1 Running 0 97s 10.128.2.35 compute-0 <none> <none>
test-rc-8cc2n 1/1 Running 0 97s 10.131.0.26 compute-1 <none> <none>
test-rc-9cqds 1/1 Running 0 97s 10.128.2.34 compute-0 <none> <none>
test-rc-m2fwv 1/1 Running 0 97s 10.128.2.36 compute-0 <none> <none>
test-rc-nllv8 1/1 Running 0 97s 10.131.0.28 compute-1 <none> <none>
test-rc-pcrpg 1/1 Running 0 97s 10.131.0.24 compute-1 <none> <none>
test-rc-psfpw 1/1 Running 0 97s 10.128.2.32 compute-0 <none> <none>
test-rc-qk4zl 1/1 Running 0 97s 10.128.2.33 compute-0 <none> <none>
test-rc-sltzs 1/1 Running 0 97s 10.131.0.25 compute-1 <none> <none>
$ oc rsh test-rc-4jpbh
~ $ while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done;
172.31.248.101
172.31.248.102
172.31.248.101
172.31.248.101
172.31.248.101
172.31.248.102^C
~ $ exit
command terminated with exit code 130
$ oc rsh test-rc-7rhcm
~ $ while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done;
172.31.248.101
172.31.248.101
172.31.248.102
172.31.248.101
172.31.248.101
172.31.248.102
172.31.248.101^C
~ $ exit
command terminated with exit code 130
$ oc get pod -n openshift-ovn-kubernetes
NAME READY STATUS RESTARTS AGE
ovnkube-master-2bgj9 6/6 Running 6 75m
ovnkube-master-qqpt8 6/6 Running 6 75m
ovnkube-master-s5s5s 6/6 Running 0 75m
ovnkube-node-8jcnc 4/4 Running 0 75m
ovnkube-node-hgjrd 4/4 Running 0 75m
ovnkube-node-nfjqn 4/4 Running 0 75m
ovnkube-node-qzj8g 4/4 Running 0 66m
ovnkube-node-tks8s 4/4 Running 0 62m
$ oc get -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}' -n openshift-ovn-kubernetes cm ovn-kubernetes-master
{"holderIdentity":"control-plane-0","leaseDurationSeconds":60,"acquireTime":"2022-03-24T13:31:41Z","renewTime":"2022-03-24T14:41:38Z","leaderTransitions":0}
$ oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-master --field-selector=spec.nodeName=control-plane-0 -o jsonpath={.items[*].metadata.name}
ovnkube-master-s5s5s
$ oc -n openshift-ovn-kubernetes rsh ovnkube-master-s5s5s
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router | grep "100 "
100 ip4.src == 10.128.2.32 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.33 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.34 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.35 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.36 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.24 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.25 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.26 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.27 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.28 reroute 100.64.0.5, 100.64.0.6
sh-4.4#
$ oc debug node/jechen-0323d-4b5h8-compute-0
Starting pod/jechen-0323d-4b5h8-compute-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.18
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# reboot
Removing debug pod ...
$ oc -n openshift-ovn-kubernetes rsh ovnkube-master-s5s5s
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router | grep "100 "
100 ip4.src == 10.128.2.32 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.33 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.34 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.35 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.36 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.24 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.25 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.26 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.27 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.28 reroute 100.64.0.5, 100.64.0.6
verified that no missing internal IP or duplicate record found
sh-4.4# ovn-nbctl --format=csv find nat external_ids:name=egressip1 | egrep -v "compute-1|compute-0"
_uuid,allowed_ext_ips,exempted_ext_ips,external_ids,external_ip,external_mac,external_port_range,logical_ip,logical_port,options,type
no stale nat entry found
Verified: Tested *** Bug 2059706 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.8.37 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:1369 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |