Bug 2070392

Summary: [OVN AWS] EgressIP was not balanced to another egress node after original node was removed egress label
Product: OpenShift Container Platform Reporter: huirwang
Component: NetworkingAssignee: Patryk Diak <pdiak>
Networking sub component: ovn-kubernetes QA Contact: huirwang
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: pdiak, skanakal
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 2078396 (view as bug list) Environment:
Last Closed: 2022-08-31 12:34:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2078396    
Bug Blocks:    

Description huirwang 2022-03-31 03:14:11 UTC
Description of problem:
 EgressIP was not balanced to another egress node after original node was removed egress label

Version-Release number of selected component (if applicable):
4.10.0-0.nightly-2022-03-29-163038

How reproducible:
Hard to reproduce in manual test, but frequently failing in auto case.

Steps to Reproduce:
$ oc get nodes
NAME                                        STATUS   ROLES    AGE    VERSION
ip-10-0-50-148.us-east-2.compute.internal   Ready    master   109m   v1.23.5+1f952b3
ip-10-0-54-82.us-east-2.compute.internal    Ready    master   109m   v1.23.5+1f952b3
ip-10-0-55-50.us-east-2.compute.internal    Ready    worker   91m    v1.23.5+1f952b3
ip-10-0-58-148.us-east-2.compute.internal   Ready    worker   91m    v1.23.5+1f952b3
ip-10-0-65-102.us-east-2.compute.internal   Ready    master   109m   v1.23.5+1f952b3
ip-10-0-65-115.us-east-2.compute.internal   Ready    worker   91m    v1.23.5+1f952b3

1. Label one node as egress node ip-10-0-55-50.us-east-2.compute.internal
2. Create namespace test and add label name=qe
3. Create egressip object, egressip was successfully assigned.

 oc get egressip
NAME             EGRESSIPS     ASSIGNED NODE                              ASSIGNED EGRESSIPS
egressip-47028   10.0.55.117   ip-10-0-55-50.us-east-2.compute.internal   10.0.55.117

oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"k8s.ovn.org/v1","kind":"EgressIP","metadata":{"annotations":{},"name":"egressip-47028"},"spec":{"egressIPs":["10.0.55.117","10.0.52.72"],"namespaceSelector":{"matchLabels":{"name":"test"}}}}
    creationTimestamp: "2022-03-31T02:40:33Z"
    generation: 2
    managedFields:
    - apiVersion: k8s.ovn.org/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:status:
          .: {}
          f:items: {}
      manager: ip-10-0-50-148
      operation: Update
      time: "2022-03-31T02:40:33Z"
    - apiVersion: k8s.ovn.org/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:kubectl.kubernetes.io/last-applied-configuration: {}
        f:spec:
          .: {}
          f:egressIPs: {}
          f:namespaceSelector:
            .: {}
            f:matchLabels:
              .: {}
              f:name: {}
      manager: kubectl-client-side-apply
      operation: Update
      time: "2022-03-31T02:40:33Z"
    name: egressip-47028
    resourceVersion: "47028"
    uid: 078d97d7-e6dd-43e7-9662-4174576bcfe2
  spec:
    egressIPs:
    - 10.0.55.117
    - 10.0.52.72
    namespaceSelector:
      matchLabels:
        name: test
  status:
    items:
    - egressIP: 10.0.55.117
      node: ip-10-0-55-50.us-east-2.compute.internal
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Actual results:

4. Remove egress label from node ip-10-0-55-50.us-east-2.compute.internal
Add egress label to another node ip-10-0-58-148.us-east-2.compute.internal

oc get node ip-10-0-55-50.us-east-2.compute.internal --show-labels
NAME                                       STATUS   ROLES    AGE   VERSION           LABELS
ip-10-0-55-50.us-east-2.compute.internal   Ready    worker   51m   v1.23.5+1f952b3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-55-50.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a

oc get node ip-10-0-58-148.us-east-2.compute.internal --show-labels
NAME                                        STATUS   ROLES    AGE   VERSION           LABELS
ip-10-0-58-148.us-east-2.compute.internal   Ready    worker   51m   v1.23.5+1f952b3   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,k8s.ovn.org/egress-assignable=true,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-58-148.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a

Actual Result:

5. Check egressip object, the assigned node was not updated
$ oc get egressip
NAME             EGRESSIPS     ASSIGNED NODE                              ASSIGNED EGRESSIPS
egressip-47028   10.0.55.117   ip-10-0-55-50.us-east-2.compute.internal   10.0.55.117

$ oc get CloudPrivateIPConfig
No resources found

Expected Result:
The assigned node should be updated correctly in egressip object.

Additional info:

Comment 9 errata-xmlrpc 2022-08-31 12:34:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.10.30 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:6133

Comment 10 Red Hat Bugzilla 2023-09-15 01:53:28 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 365 days