Bug 2034513

Summary: [OVN] After update one EgressIP in EgressIP object, one internal IP lost from lr-policy-list
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Ben Bennett <bbennett>
Networking sub component: ovn-kubernetes QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: anbhat, dbrahane, jechen, trozet
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:35:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description huirwang 2021-12-21 07:53:21 UTC
Description of problem:
Found this issue in ovn vsphere environment, and should be different from https://bugzilla.redhat.com/show_bug.cgi?id=2034097 even both of them are updating EgressIP object. Actually I didn't reproduce 2034097 in vsphere env. So open this to track the different wrong behavior 

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2021-12-18-034942 

How reproducible:
Always

Steps to Reproduce:
1. Tag 3 nodes as egress nodes

2.Create one egressip object
oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2021-12-21T07:21:18Z"
    generation: 2
    name: egressip-example6
    resourceVersion: "108167"
    uid: 9de73236-c1b2-4949-86cb-1358cebd2100
  spec:
    egressIPs:
    - 172.31.249.79
    - 172.31.249.246
    - 172.31.249.133
    namespaceSelector:
      matchLabels:
        team: red
    podSelector: {}
  status:
    items:
    - egressIP: 172.31.249.133
      node: control-plane-0
    - egressIP: 172.31.249.246
      node: compute-1
    - egressIP: 172.31.249.79
      node: compute-0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

3. Create namespace test and pod in it. Add team=red to namespace .

4. Check lr-policy-list
oc rsh -n openshift-ovn-kubernetes ovnkube-master-ct9qv 
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4# ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 " 
       100                             ip4.src == 10.128.2.22         reroute                100.64.0.2, 100.64.0.5, 100.64.0.6
       100                             ip4.src == 10.128.2.23         reroute                100.64.0.2, 100.64.0.5, 100.64.0.6

5. Update EgressIP object 172.31.249.79 to new IP, here 172.31.249.157
oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2021-12-21T07:21:18Z"
    generation: 4
    name: egressip-example6
    resourceVersion: "108823"
    uid: 9de73236-c1b2-4949-86cb-1358cebd2100
  spec:
    egressIPs:
    - 172.31.249.157
    - 172.31.249.246
    - 172.31.249.133
    namespaceSelector:
      matchLabels:
        team: red
    podSelector: {}
  status:
    items:
    - egressIP: 172.31.249.133
      node: control-plane-0
    - egressIP: 172.31.249.246
      node: compute-1
    - egressIP: 172.31.249.157
      node: compute-0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

6. Check lr-policy-list again. 





Actual results:
Found the internal IPs of the  lr-policy-list only two.
$ oc rsh -n openshift-ovn-kubernetes ovnkube-master-ct9qv 
Defaulted container "northd" out of: northd, nbdb, kube-rbac-proxy, sbdb, ovnkube-master, ovn-dbchecker
sh-4.4#  ovn-nbctl lr-policy-list ovn_cluster_router  | grep "100 " 
       100                             ip4.src == 10.128.2.22         reroute                100.64.0.2, 100.64.0.5
       100                             ip4.src == 10.128.2.23         reroute                100.64.0.2, 100.64.0.5

And curl the outside from matched pod, the load balancing only missed the new EgressIP.

 oc rsh -n test test-rc-8p58n
~ $  while true; do curl 172.31.249.80:9095 --connect-timeout 2 ; echo "";sleep 2; done
172.31.249.133
172.31.249.133
172.31.249.133
172.31.249.246
172.31.249.246
172.31.249.133
172.31.249.246
172.31.249.246
172.31.249.246
172.31.249.133
172.31.249.246


Expected results:
The updated EgressIP can take effect.

Additional info:

Comment 11 errata-xmlrpc 2022-03-10 16:35:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056