Bug 2063321
| Summary: | [OVN]After reboot egress node, lr-policy-list was not correct, some duplicate records or missed internal IPs | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | ffernand <ffernand> |
| Component: | Networking | Assignee: | ffernand <ffernand> |
| Networking sub component: | ovn-kubernetes | QA Contact: | jechen <jechen> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | anbhat, andcosta, anusaxen, ffernand, fhirtz, huirwang, jechen, jocolema, openshift-bugzilla-robot, surya, trozet, wrussell, zzhao |
| Version: | 4.7 | Keywords: | Triaged |
| Target Milestone: | --- | ||
| Target Release: | 4.11.0 | ||
| Hardware: | x86_64 | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | 2056050 | Environment: | |
| Last Closed: | 2022-08-10 10:54:04 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2059354 | ||
|
Comment 6
Tim Rozet
2022-03-25 14:56:48 UTC
Verified with 4.11.0-0.nightly-2022-03-23-132952
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-03-23-132952 True False 126m Cluster version is 4.11.0-0.nightly-2022-03-23-132952
$ oc get node
NAME STATUS ROLES AGE VERSION
jechen-0325f-6sss4-compute-0 Ready worker 31m v1.23.3+b085777
jechen-0325f-6sss4-compute-1 Ready worker 31m v1.23.3+b085777
jechen-0325f-6sss4-control-plane-0 Ready master 43m v1.23.3+b085777
jechen-0325f-6sss4-control-plane-1 Ready master 43m v1.23.3+b085777
jechen-0325f-6sss4-control-plane-2 Ready master 42m v1.23.3+b085777
$ oc label node jechen-0325f-6sss4-compute-0 "k8s.ovn.org/egress-assignable"=""
node/jechen-0325f-6sss4-compute-0 labeled
$ oc label node jechen-0325f-6sss4-compute-1 "k8s.ovn.org/egress-assignable"=""
node/jechen-0325f-6sss4-compute-1 labeled
$ cat config_egressip1_ovn_ns_team_red.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
name: egressip1
spec:
egressIPs:
- 172.31.248.101
- 172.31.248.102
- 172.31.248.103
namespaceSelector:
matchLabels:
team: red
$ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created
$ oc get egressip -oyaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
creationTimestamp: "2022-03-25T20:40:58Z"
generation: 2
name: egressip1
resourceVersion: "76047"
uid: 05d9d935-a2ad-43b7-8780-a80ebfecaf11
spec:
egressIPs:
- 172.31.248.101
- 172.31.248.102
- 172.31.248.103
namespaceSelector:
matchLabels:
team: red
status:
items:
- egressIP: 172.31.248.103
node: jechen-0325f-6sss4-compute-0
- egressIP: 172.31.248.101
node: jechen-0325f-6sss4-compute-1
kind: List
metadata:
resourceVersion: ""
selfLink: ""
$ oc new-project test
$ oc label ns test team=red
namespace/test labeled
$ oc create -f ./SDN-1332-test/list_for_pods.json
replicationcontroller/test-rc created
service/test-service created
$ oc get pod -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-rc-4fm7l 1/1 Running 0 45s 10.131.0.15 jechen-0325f-6sss4-compute-1 <none> <none>
test-rc-52s82 1/1 Running 0 45s 10.131.0.16 jechen-0325f-6sss4-compute-1 <none> <none>
test-rc-5br68 1/1 Running 0 45s 10.128.2.35 jechen-0325f-6sss4-compute-0 <none> <none>
test-rc-6fcxk 1/1 Running 0 45s 10.128.2.36 jechen-0325f-6sss4-compute-0 <none> <none>
test-rc-8pw4s 1/1 Running 0 45s 10.131.0.18 jechen-0325f-6sss4-compute-1 <none> <none>
test-rc-fjd9x 1/1 Running 0 45s 10.131.0.17 jechen-0325f-6sss4-compute-1 <none> <none>
test-rc-gjzwn 1/1 Running 0 45s 10.128.2.32 jechen-0325f-6sss4-compute-0 <none> <none>
test-rc-pxvrv 1/1 Running 0 45s 10.128.2.34 jechen-0325f-6sss4-compute-0 <none> <none>
test-rc-r2whh 1/1 Running 0 45s 10.131.0.14 jechen-0325f-6sss4-compute-1 <none> <none>
test-rc-wz4kz 1/1 Running 0 45s 10.128.2.33 jechen-0325f-6sss4-compute-0 <none> <none>
$ oc rsh test-rc-4fm7l
~ $ while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done;
172.31.248.103
172.31.248.103
172.31.248.103
172.31.248.101
172.31.248.103
172.31.248.101
172.31.248.101^C
~ $ exit
command terminated with exit code 130
$ oc get -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}' -n openshift-ovn-kubernetes cm ovn-kubernetes-master
{"holderIdentity":"jechen-0325f-6sss4-control-plane-1","leaseDurationSeconds":60,"acquireTime":"2022-03-25T18:14:10Z","renewTime":"2022-03-25T20:44:54Z","leaderTransitions":0}[jechen@jechen ~]
$ oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-master --field-selector=spec.nodeName=jechen-0325f-6sss4-control-plane-1 -o jsonpath={.items[*].metadata.name}
ovnkube-master-hk4tt
$ oc -n openshift-ovn-kubernetes rsh ovnkube-master-hk4tt
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4#
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router | grep "100 "
100 ip4.src == 10.128.2.32 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.33 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.34 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.35 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.36 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.14 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.15 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.16 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.17 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.18 reroute 100.64.0.5, 100.64.0.6
$ oc debug node/jechen-0325f-6sss4-compute-1
Starting pod/jechen-0325f-6sss4-compute-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.11
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# reboot
Removing debug pod ...
$ oc -n openshift-ovn-kubernetes rsh ovnkube-master-hk4tt
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router | grep "100 "
100 ip4.src == 10.128.2.32 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.33 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.34 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.35 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.128.2.36 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.14 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.15 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.16 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.17 reroute 100.64.0.5, 100.64.0.6
100 ip4.src == 10.131.0.18 reroute 100.64.0.5, 100.64.0.6
sh-4.4#
sh-4.4#
sh-4.4# exit
exit
verified, no missing internal IP or duplicate record found
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |