Bug 2063321 - [OVN]After reboot egress node, lr-policy-list was not correct, some duplicate records or missed internal IPs
Summary: [OVN]After reboot egress node, lr-policy-list was not correct, some duplicat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.11.0
Assignee: ffernand
QA Contact: jechen
URL:
Whiteboard:
Depends On:
Blocks: 2059354
TreeView+ depends on / blocked
 
Reported: 2022-03-11 19:05 UTC by ffernand
Modified: 2022-08-10 10:54 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 2056050
Environment:
Last Closed: 2022-08-10 10:54:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 1000 0 None open Bug 2063321: [DownstreamMerge] Downstream merge 17-03-2022 2022-03-17 19:24:58 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:54:22 UTC

Comment 6 Tim Rozet 2022-03-25 14:56:48 UTC
The fix had a regression that is being handled by:
https://github.com/ovn-org/ovn-kubernetes/pull/2873

Moving back to POST.

Comment 7 jechen 2022-03-25 20:57:23 UTC
Verified with 4.11.0-0.nightly-2022-03-23-132952

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-03-23-132952   True        False         126m    Cluster version is 4.11.0-0.nightly-2022-03-23-132952

$ oc get node
NAME                                 STATUS   ROLES    AGE   VERSION
jechen-0325f-6sss4-compute-0         Ready    worker   31m   v1.23.3+b085777
jechen-0325f-6sss4-compute-1         Ready    worker   31m   v1.23.3+b085777
jechen-0325f-6sss4-control-plane-0   Ready    master   43m   v1.23.3+b085777
jechen-0325f-6sss4-control-plane-1   Ready    master   43m   v1.23.3+b085777
jechen-0325f-6sss4-control-plane-2   Ready    master   42m   v1.23.3+b085777


$ oc label node jechen-0325f-6sss4-compute-0 "k8s.ovn.org/egress-assignable"=""
node/jechen-0325f-6sss4-compute-0 labeled

$ oc label node jechen-0325f-6sss4-compute-1 "k8s.ovn.org/egress-assignable"=""
node/jechen-0325f-6sss4-compute-1 labeled

$ cat config_egressip1_ovn_ns_team_red.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressip1
spec:
  egressIPs:
  - 172.31.248.101
  - 172.31.248.102
  - 172.31.248.103
  namespaceSelector:
    matchLabels:
      team: red 


$  oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created


$ oc get egressip -oyaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2022-03-25T20:40:58Z"
    generation: 2
    name: egressip1
    resourceVersion: "76047"
    uid: 05d9d935-a2ad-43b7-8780-a80ebfecaf11
  spec:
    egressIPs:
    - 172.31.248.101
    - 172.31.248.102
    - 172.31.248.103
    namespaceSelector:
      matchLabels:
        team: red
  status:
    items:
    - egressIP: 172.31.248.103
      node: jechen-0325f-6sss4-compute-0
    - egressIP: 172.31.248.101
      node: jechen-0325f-6sss4-compute-1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""



$ oc new-project test
$ oc label ns test team=red
namespace/test labeled

$ oc create -f ./SDN-1332-test/list_for_pods.json
replicationcontroller/test-rc created
service/test-service created


$ oc get pod -owide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE                           NOMINATED NODE   READINESS GATES
test-rc-4fm7l   1/1     Running   0          45s   10.131.0.15   jechen-0325f-6sss4-compute-1   <none>           <none>
test-rc-52s82   1/1     Running   0          45s   10.131.0.16   jechen-0325f-6sss4-compute-1   <none>           <none>
test-rc-5br68   1/1     Running   0          45s   10.128.2.35   jechen-0325f-6sss4-compute-0   <none>           <none>
test-rc-6fcxk   1/1     Running   0          45s   10.128.2.36   jechen-0325f-6sss4-compute-0   <none>           <none>
test-rc-8pw4s   1/1     Running   0          45s   10.131.0.18   jechen-0325f-6sss4-compute-1   <none>           <none>
test-rc-fjd9x   1/1     Running   0          45s   10.131.0.17   jechen-0325f-6sss4-compute-1   <none>           <none>
test-rc-gjzwn   1/1     Running   0          45s   10.128.2.32   jechen-0325f-6sss4-compute-0   <none>           <none>
test-rc-pxvrv   1/1     Running   0          45s   10.128.2.34   jechen-0325f-6sss4-compute-0   <none>           <none>
test-rc-r2whh   1/1     Running   0          45s   10.131.0.14   jechen-0325f-6sss4-compute-1   <none>           <none>
test-rc-wz4kz   1/1     Running   0          45s   10.128.2.33   jechen-0325f-6sss4-compute-0   <none>           <none>


$ oc rsh test-rc-4fm7l
~ $ while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done;
172.31.248.103
172.31.248.103
172.31.248.103
172.31.248.101
172.31.248.103
172.31.248.101
172.31.248.101^C
~ $ exit
command terminated with exit code 130


$ oc get -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}'  -n openshift-ovn-kubernetes  cm ovn-kubernetes-master
{"holderIdentity":"jechen-0325f-6sss4-control-plane-1","leaseDurationSeconds":60,"acquireTime":"2022-03-25T18:14:10Z","renewTime":"2022-03-25T20:44:54Z","leaderTransitions":0}[jechen@jechen ~]


$  oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-master --field-selector=spec.nodeName=jechen-0325f-6sss4-control-plane-1 -o jsonpath={.items[*].metadata.name}
ovnkube-master-hk4tt


$ oc -n openshift-ovn-kubernetes rsh ovnkube-master-hk4tt
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# 
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 "
       100                             ip4.src == 10.128.2.32         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.33         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.34         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.35         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.36         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.14         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.15         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.16         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.17         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.18         reroute                100.64.0.5, 100.64.0.6


$ oc debug node/jechen-0325f-6sss4-compute-1
Starting pod/jechen-0325f-6sss4-compute-1-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.11
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# reboot

Removing debug pod ...


$ oc -n openshift-ovn-kubernetes rsh ovnkube-master-hk4tt
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 "
       100                             ip4.src == 10.128.2.32         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.33         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.34         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.35         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.36         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.14         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.15         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.16         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.17         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.18         reroute                100.64.0.5, 100.64.0.6
sh-4.4# 
sh-4.4# 
sh-4.4# exit
exit

verified, no missing internal IP or duplicate record found

Comment 9 errata-xmlrpc 2022-08-10 10:54:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.