Bug 2059700 - [OVN]After reboot egress node, lr-policy-list was not correct, some duplicate records or missed internal IPs
Summary: [OVN]After reboot egress node, lr-policy-list was not correct, some duplicat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.9.z
Assignee: ffernand
QA Contact: jechen
URL:
Whiteboard:
: 2047416 (view as bug list)
Depends On: 2059354
Blocks: 2059706 2062842
TreeView+ depends on / blocked
 
Reported: 2022-03-01 18:50 UTC by ffernand
Modified: 2023-09-15 01:22 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2059354
: 2059706 (view as bug list)
Environment:
Last Closed: 2022-04-20 14:49:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 981 0 None open Bug 2059700: [4.9z] After reboot egress node, lr-policy-list was not correct, some duplicate records or missed internal ... 2022-03-01 21:03:53 UTC
Red Hat Product Errata RHSA-2022:1363 0 None Waiting on Red Hat [BUG] Case to Track Long Term Solution of Issue Reported in Case 03054579 2022-04-29 11:43:18 UTC

Comment 2 jechen 2022-03-28 22:08:32 UTC
Verified with pre-merged image built with ovn-kubernetes#981

$ oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2022-03-28-203944-ci-ln-pn3g9qt-latest   True        False         29m     Cluster version is 4.9.0-0.ci.test-2022-03-28-203944-ci-ln-pn3g9qt-latest


$ oc get node
NAME              STATUS   ROLES    AGE   VERSION
compute-0         Ready    worker   38m   v1.22.5+5c84e52
compute-1         Ready    worker   38m   v1.22.5+5c84e52
control-plane-0   Ready    master   51m   v1.22.5+5c84e52
control-plane-1   Ready    master   51m   v1.22.5+5c84e52
control-plane-2   Ready    master   51m   v1.22.5+5c84e52


$ oc label node compute-0 "k8s.ovn.org/egress-assignable"=""
node/compute-0 labeled

$ oc label node compute-0 "k8s.ovn.org/egress-assignable"=""
node/compute-0 labeled

$ cat config_egressip1_ovn_ns_team_red.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressip1
spec:
  egressIPs:
  - 172.31.248.101
  - 172.31.248.102
  - 172.31.248.103
  namespaceSelector:
    matchLabels:
      team: red 

$ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created

$  oc get egressip -oyaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2022-03-28T21:54:26Z"
    generation: 2
    name: egressip1
    resourceVersion: "34665"
    uid: c5b65f94-383f-4fa1-9680-0152dfb6c83a
  spec:
    egressIPs:
    - 172.31.248.101
    - 172.31.248.102
    - 172.31.248.103
    namespaceSelector:
      matchLabels:
        team: red
    podSelector: {}
  status:
    items:
    - egressIP: 172.31.248.101
      node: compute-0
    - egressIP: 172.31.248.102
      node: compute-1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


$ oc new-project test

$ oc label ns test team=red
namespace/test labeled

$ oc create -f ./SDN-1332-test/list_for_pods.json
replicationcontroller/test-rc created
service/test-service created

$ oc get pod
NAME            READY   STATUS    RESTARTS   AGE
test-rc-99qdz   1/1     Running   0          12s
test-rc-bzq9x   1/1     Running   0          12s
test-rc-cbtzh   1/1     Running   0          12s
test-rc-frqnd   1/1     Running   0          12s
test-rc-jlnsh   1/1     Running   0          12s
test-rc-kmnh9   1/1     Running   0          12s
test-rc-kvsv7   1/1     Running   0          12s
test-rc-qs8gc   1/1     Running   0          12s
test-rc-snvb7   1/1     Running   0          12s
test-rc-wmrwh   1/1     Running   0          12s


$ oc rsh test-rc-99qdz
~ $ while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done;
172.31.248.101
172.31.248.102
172.31.248.101
172.31.248.101
172.31.248.101
172.31.248.101
172.31.248.102
172.31.248.102
172.31.248.102^C
~ $ exit
command terminated with exit code 130

$ oc get pod -owide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
test-rc-99qdz   1/1     Running   0          62s   10.131.0.29   compute-0   <none>           <none>
test-rc-bzq9x   1/1     Running   0          62s   10.128.2.30   compute-1   <none>           <none>
test-rc-cbtzh   1/1     Running   0          62s   10.131.0.30   compute-0   <none>           <none>
test-rc-frqnd   1/1     Running   0          62s   10.128.2.33   compute-1   <none>           <none>
test-rc-jlnsh   1/1     Running   0          62s   10.131.0.28   compute-0   <none>           <none>
test-rc-kmnh9   1/1     Running   0          62s   10.128.2.29   compute-1   <none>           <none>
test-rc-kvsv7   1/1     Running   0          62s   10.131.0.31   compute-0   <none>           <none>
test-rc-qs8gc   1/1     Running   0          62s   10.131.0.32   compute-0   <none>           <none>
test-rc-snvb7   1/1     Running   0          62s   10.128.2.31   compute-1   <none>           <none>
test-rc-wmrwh   1/1     Running   0          62s   10.128.2.32   compute-1   <none>           <none>

$ oc get -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}'  -n openshift-ovn-kubernetes  cm ovn-kubernetes-master
{"holderIdentity":"control-plane-2","leaseDurationSeconds":60,"acquireTime":"2022-03-28T21:03:13Z","renewTime":"2022-03-28T21:56:31Z","leaderTransitions":0}

$ oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-master --field-selector=spec.nodeName=control-plane-2 -o jsonpath={.items[*].metadata.name}
ovnkube-master-zvvb8

$ oc -n openshift-ovn-kubernetes rsh  ovnkube-master-zvvb8
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 "
       100                             ip4.src == 10.128.2.29         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.30         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.31         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.32         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.33         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.28         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.29         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.30         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.31         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.32         reroute                100.64.0.5, 100.64.0.6
sh-4.4# 
sh-4.4# exit
exit


$ oc debug node/compute-0
Starting pod/compute-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.31
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# reboot
Terminated
sh-4.4# 
Removing debug pod ...

$ oc -n openshift-ovn-kubernetes rsh  ovnkube-master-zvvb8
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 "
       100                             ip4.src == 10.128.2.29         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.30         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.31         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.32         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.33         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.28         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.29         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.30         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.31         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.32         reroute                100.64.0.5, 100.64.0.6
sh-4.4# ^C

==>  no missing internal IP or duplicate record found

Comment 6 ffernand 2022-04-08 20:03:04 UTC
*** Bug 2047416 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2022-04-20 14:49:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.29 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1363

Comment 13 Red Hat Bugzilla 2023-09-15 01:22:29 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days


Note You need to log in before you can comment on or make changes to this bug.