Bug 2059700 - [OVN]After reboot egress node, lr-policy-list was not correct, some duplicate records or missed internal IPs [NEEDINFO]
Summary: [OVN]After reboot egress node, lr-policy-list was not correct, some duplicat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 4.9.z
Assignee: ffernand
QA Contact: jechen
URL:
Whiteboard:
: 2047416 (view as bug list)
Depends On: 2059354
Blocks: 2059706 2062842
TreeView+ depends on / blocked
 
Reported: 2022-03-01 18:50 UTC by ffernand
Modified: 2022-04-20 14:50 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2059354
: 2059706 (view as bug list)
Environment:
Last Closed: 2022-04-20 14:49:50 UTC
Target Upstream Version:
fhirtz: needinfo? (ffernand)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 981 0 None open Bug 2059700: [4.9z] After reboot egress node, lr-policy-list was not correct, some duplicate records or missed internal ... 2022-03-01 21:03:53 UTC
Red Hat Product Errata RHSA-2022:1363 0 None Waiting on Red Hat [BUG] Case to Track Long Term Solution of Issue Reported in Case 03054579 2022-04-29 11:43:18 UTC

Comment 2 jechen 2022-03-28 22:08:32 UTC
Verified with pre-merged image built with ovn-kubernetes#981

$ oc get clusterversion
NAME      VERSION                                                  AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.ci.test-2022-03-28-203944-ci-ln-pn3g9qt-latest   True        False         29m     Cluster version is 4.9.0-0.ci.test-2022-03-28-203944-ci-ln-pn3g9qt-latest


$ oc get node
NAME              STATUS   ROLES    AGE   VERSION
compute-0         Ready    worker   38m   v1.22.5+5c84e52
compute-1         Ready    worker   38m   v1.22.5+5c84e52
control-plane-0   Ready    master   51m   v1.22.5+5c84e52
control-plane-1   Ready    master   51m   v1.22.5+5c84e52
control-plane-2   Ready    master   51m   v1.22.5+5c84e52


$ oc label node compute-0 "k8s.ovn.org/egress-assignable"=""
node/compute-0 labeled

$ oc label node compute-0 "k8s.ovn.org/egress-assignable"=""
node/compute-0 labeled

$ cat config_egressip1_ovn_ns_team_red.yaml
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
  name: egressip1
spec:
  egressIPs:
  - 172.31.248.101
  - 172.31.248.102
  - 172.31.248.103
  namespaceSelector:
    matchLabels:
      team: red 

$ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created

$  oc get egressip -oyaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2022-03-28T21:54:26Z"
    generation: 2
    name: egressip1
    resourceVersion: "34665"
    uid: c5b65f94-383f-4fa1-9680-0152dfb6c83a
  spec:
    egressIPs:
    - 172.31.248.101
    - 172.31.248.102
    - 172.31.248.103
    namespaceSelector:
      matchLabels:
        team: red
    podSelector: {}
  status:
    items:
    - egressIP: 172.31.248.101
      node: compute-0
    - egressIP: 172.31.248.102
      node: compute-1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


$ oc new-project test

$ oc label ns test team=red
namespace/test labeled

$ oc create -f ./SDN-1332-test/list_for_pods.json
replicationcontroller/test-rc created
service/test-service created

$ oc get pod
NAME            READY   STATUS    RESTARTS   AGE
test-rc-99qdz   1/1     Running   0          12s
test-rc-bzq9x   1/1     Running   0          12s
test-rc-cbtzh   1/1     Running   0          12s
test-rc-frqnd   1/1     Running   0          12s
test-rc-jlnsh   1/1     Running   0          12s
test-rc-kmnh9   1/1     Running   0          12s
test-rc-kvsv7   1/1     Running   0          12s
test-rc-qs8gc   1/1     Running   0          12s
test-rc-snvb7   1/1     Running   0          12s
test-rc-wmrwh   1/1     Running   0          12s


$ oc rsh test-rc-99qdz
~ $ while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done;
172.31.248.101
172.31.248.102
172.31.248.101
172.31.248.101
172.31.248.101
172.31.248.101
172.31.248.102
172.31.248.102
172.31.248.102^C
~ $ exit
command terminated with exit code 130

$ oc get pod -owide
NAME            READY   STATUS    RESTARTS   AGE   IP            NODE        NOMINATED NODE   READINESS GATES
test-rc-99qdz   1/1     Running   0          62s   10.131.0.29   compute-0   <none>           <none>
test-rc-bzq9x   1/1     Running   0          62s   10.128.2.30   compute-1   <none>           <none>
test-rc-cbtzh   1/1     Running   0          62s   10.131.0.30   compute-0   <none>           <none>
test-rc-frqnd   1/1     Running   0          62s   10.128.2.33   compute-1   <none>           <none>
test-rc-jlnsh   1/1     Running   0          62s   10.131.0.28   compute-0   <none>           <none>
test-rc-kmnh9   1/1     Running   0          62s   10.128.2.29   compute-1   <none>           <none>
test-rc-kvsv7   1/1     Running   0          62s   10.131.0.31   compute-0   <none>           <none>
test-rc-qs8gc   1/1     Running   0          62s   10.131.0.32   compute-0   <none>           <none>
test-rc-snvb7   1/1     Running   0          62s   10.128.2.31   compute-1   <none>           <none>
test-rc-wmrwh   1/1     Running   0          62s   10.128.2.32   compute-1   <none>           <none>

$ oc get -o jsonpath='{.metadata.annotations.control-plane\.alpha\.kubernetes\.io/leader}'  -n openshift-ovn-kubernetes  cm ovn-kubernetes-master
{"holderIdentity":"control-plane-2","leaseDurationSeconds":60,"acquireTime":"2022-03-28T21:03:13Z","renewTime":"2022-03-28T21:56:31Z","leaderTransitions":0}

$ oc get pod -n openshift-ovn-kubernetes -l app=ovnkube-master --field-selector=spec.nodeName=control-plane-2 -o jsonpath={.items[*].metadata.name}
ovnkube-master-zvvb8

$ oc -n openshift-ovn-kubernetes rsh  ovnkube-master-zvvb8
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 "
       100                             ip4.src == 10.128.2.29         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.30         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.31         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.32         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.33         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.28         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.29         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.30         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.31         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.32         reroute                100.64.0.5, 100.64.0.6
sh-4.4# 
sh-4.4# exit
exit


$ oc debug node/compute-0
Starting pod/compute-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 172.31.248.31
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# reboot
Terminated
sh-4.4# 
Removing debug pod ...

$ oc -n openshift-ovn-kubernetes rsh  ovnkube-master-zvvb8
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 "
       100                             ip4.src == 10.128.2.29         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.30         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.31         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.32         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.33         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.28         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.29         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.30         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.31         reroute                100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.131.0.32         reroute                100.64.0.5, 100.64.0.6
sh-4.4# ^C

==>  no missing internal IP or duplicate record found

Comment 6 ffernand 2022-04-08 20:03:04 UTC
*** Bug 2047416 has been marked as a duplicate of this bug. ***

Comment 12 errata-xmlrpc 2022-04-20 14:49:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.29 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:1363


Note You need to log in before you can comment on or make changes to this bug.