Bug 1995887 - [OVN]After reboot egress node, lr-policy-list was not correct, some duplicate records or missed internal IPs
Summary: [OVN]After reboot egress node, lr-policy-list was not correct, some duplicat...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.7
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.10.0
Assignee: Ben Bennett
QA Contact: huirwang
URL:
Whiteboard:
: 2034790 (view as bug list)
Depends On:
Blocks: 2034668
TreeView+ depends on / blocked
 
Reported: 2021-08-20 04:04 UTC by huirwang
Modified: 2023-09-18 04:25 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2034668 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:05:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 834 0 None Merged Bug 2018966: [DownstreamMerge] Revert revert #834 2021-12-21 16:30:31 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:05:37 UTC

Description huirwang 2021-08-20 04:04:52 UTC
Description of problem:
After reboot egress node,  lr-policy-list was not correct, some duplicate records or missed internal IPs

Version-Release number of selected component (if applicable):
4.9.0-0.nightly-2021-08-19-184748

How reproducible:
Frequently 

Steps to Reproduce:
1. Lable 3 egress nodes and create one egressip object
...
spec:
    egressIPs:
    - 172.31.248.103
    - 172.31.248.104
    - 172.31.248.105
    namespaceSelector:
      matchLabels:
        name: test
    podSelector: {}
  status:
    items:
    - egressIP: 172.31.248.103
      node: compute-1
    - egressIP: 172.31.248.104
      node: compute-2
    - egressIP: 172.31.248.105
      node: compute-0
...
2.Create two namespace, and label ns name=test, then create 10 pods for each namespace.

3. Check the lr-policy-list 

sh-4.4#  ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 "
       100                             ip4.src == 10.128.2.45         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.46         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.47         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.48         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.49         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.50         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.31         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.32         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.33         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.34         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.35         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.51         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.52         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.53         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.54         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.55         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.56         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.57         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.58         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.59         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
sh-4.4#  ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 " | wc -l
20

4. Then reboot egress node compute-0

5. After the node back to ready, check lr-policy-list  again

Actual results:
There some duplicate records. Like  ,one is with 3 internal IPs, one is with 2 internal IPs
      100                             ip4.src == 10.131.0.56         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
      100                             ip4.src == 10.131.0.56         reroute                100.64.0.6, 100.64.0.7

Some missed internal IP. Only two internal IPs, if curl from the related pods, it will use 2 egress nodes, even we have 3 egress nodes available.
100                             ip4.src == 10.131.0.57         reroute                100.64.0.6, 100.64.0.7

$ oc rsh -n test2 test-rc-sbp6f
~ $  while true; do curl 172.31.249.80:9095;sleep 2; echo ""; done;
172.31.248.104
172.31.248.104
172.31.248.103
172.31.248.104
172.31.248.103
172.31.248.103
172.31.248.104
172.31.248.104
172.31.248.103
172.31.248.104
172.31.248.104
172.31.248.104
172.31.248.104
172.31.248.104
172.31.248.104
.....
172.31.248.104
172.31.248.104
172.31.248.103

sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 "
       100                             ip4.src == 10.128.2.45         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.46         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.47         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.48         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.49         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.128.2.50         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.31         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.32         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.33         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.34         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.129.2.35         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.51         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.51         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.52         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.52         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.53         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.53         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.54         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.54         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.55         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.55         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.56         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.56         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.57         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.58         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.58         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.59         reroute                100.64.0.6, 100.64.0.7
       100                             ip4.src == 10.131.0.59         reroute                100.64.0.5, 100.64.0.6, 100.64.0.7

sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 " | wc -l
28

Expected results:
After egress nodes reboot, lr-policy-list should have records as before reboot nodes.

Additional info:
Moreover, after delete two namespaces, test and test2, that means all test pods gone, there are still some lr-policy-list left.

workaround is restart ovn-kubemaster pods.

Comment 1 Alexander Constantinescu 2021-10-12 10:41:11 UTC
Hi Huiran

I can't remember exactly but I get the feeling this problem was linked to https://bugzilla.redhat.com/show_bug.cgi?id=1973215 

1973215 was fixed on 4.9 before code freeze, so could you try to reproduce this problem with the latest version of 4.9 to verify if they indeed are duplicates? 

If the problem has not been resolved on 4.9: could you provide a kubeconfig / must-gather? 

Thanks in advance!

Comment 7 Alexander Constantinescu 2021-12-22 13:10:49 UTC
*** Bug 2034790 has been marked as a duplicate of this bug. ***

Comment 16 errata-xmlrpc 2022-03-10 16:05:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Comment 17 Red Hat Bugzilla 2023-09-18 04:25:20 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.