Bug 2034087 - [OVN] EgressIP was assigned to the node which is not egress node anymore
Summary: [OVN] EgressIP was assigned to the node which is not egress node anymore
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Ben Bennett
QA Contact: huirwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-12-20 05:19 UTC by huirwang
Modified: 2022-03-10 16:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-03-10 16:35:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ovn-kubernetes pull 917 0 None open Bug 2039099: EgressIP fixes for 4.10 2022-01-20 18:24:32 UTC
Github ovn-org ovn-kubernetes pull 2734 0 None Merged EgressIP: miscellaneous fixes 2022-01-20 18:24:34 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:35:21 UTC

Description huirwang 2021-12-20 05:19:39 UTC
Description of problem:
[OVN AWS] EgressIP was assigned to the node which is not egress node anymore. 

Version-Release number of selected component (if applicable):
4.10.0-0.ci-2021-12-19-184945

How reproducible:


Steps to Reproduce:
1. Label two nodes as egress nodes and then create one EgressIP object contains two EgressIPs

$ oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2021-12-20T02:49:46Z"
    generation: 6
    name: egressip-1
    resourceVersion: "57713"
    uid: 712c04a6-ae7f-4053-81c4-c9c192b85f4d
  spec:
    egressIPs:
    - 10.0.48.214
    - 10.0.48.215
    namespaceSelector:
      matchLabels:
        name: test
    podSelector: {}
  status:
    items:
    - egressIP: 10.0.48.214
      node: ip-10-0-48-213.us-east-2.compute.internal
    - egressIP: 10.0.48.215
      node: ip-10-0-51-53.us-east-2.compute.internal
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

2. Remove egress label from one node.
$ oc label node ip-10-0-48-213.us-east-2.compute.internal k8s.ovn.org/egress-assignable-
node/ip-10-0-48-213.us-east-2.compute.internal labeled

3. Check egressip object
$ oc get egressip
NAME         EGRESSIPS     ASSIGNED NODE                              ASSIGNED EGRESSIPS
egressip-1   10.0.48.214   ip-10-0-51-53.us-east-2.compute.internal   10.0.48.215

4. Add a iptable rule on node ip-10-0-51-53.us-east-2.compute.internal  to deny traffic targeting to port 9

$ oc debug node/ip-10-0-51-53.us-east-2.compute.internal
Starting pod/ip-10-0-51-53us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.51.53
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# iptables -A INPUT -p tcp --destination-port 9 -j DROP
sh-4.4# 

5. Check egressip object status
$ oc get egressip
NAME         EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   10.0.48.214

6. Remove the new added iptable rule
$ oc debug node/ip-10-0-51-53.us-east-2.compute.internal
Starting pod/ip-10-0-51-53us-east-2computeinternal-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.0.51.53
If you don't see a command prompt, try pressing enter.
sh-4.4# chroot /host
sh-4.4# iptables -D INPUT 2
sh-4.4# iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    KUBE-FIREWALL  all  --  anywhere             anywhere            
sh-4.4# 

7. $ oc get egressip
NAME         EGRESSIPS     ASSIGNED NODE   ASSIGNED EGRESSIPS
egressip-1   10.0.48.214 
8. Restart ovnkube-master pods
$ oc get pods -n openshift-ovn-kubernetes
huiran-mac:script hrwang$ oc delete pods -n openshift-ovn-kubernetes -l app=ovnkube-master
pod "ovnkube-master-cn9ql" deleted
pod "ovnkube-master-m44tg" deleted
pod "ovnkube-master-mx7sw" deleted

9. Check EgressIP object again.

10. Also double check the nodes labels

oc get node ip-10-0-48-213.us-east-2.compute.internal --show-labels
NAME                                        STATUS   ROLES    AGE    VERSION           LABELS
ip-10-0-48-213.us-east-2.compute.internal   Ready    master   113m   v1.22.1+6859754   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-48-213.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
$ oc get node ip-10-0-48-213.us-east-2.compute.internal --show-labels | grep egress-assignable=
$

$  oc get node ip-10-0-51-53.us-east-2.compute.internal --show-labels | grep egress-assignable=
ip-10-0-51-53.us-east-2.compute.internal   Ready    worker   3h8m   v1.22.1+6859754   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,k8s.ovn.org/egress-assignable=,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-51-53.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a



Actual results:
The egressip was assigned to node ip-10-0-48-213.us-east-2.compute.internal  , but it was already removed the EgressIP label.

$  oc get egressip
NAME         EGRESSIPS     ASSIGNED NODE                               ASSIGNED EGRESSIPS
egressip-1   10.0.48.214   ip-10-0-48-213.us-east-2.compute.internal   10.0.48.214

 oc get egressip -o yaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2021-12-20T02:49:46Z"
    generation: 9
    name: egressip-1
    resourceVersion: "60866"
    uid: 712c04a6-ae7f-4053-81c4-c9c192b85f4d
  spec:
    egressIPs:
    - 10.0.48.214
    - 10.0.48.215
    namespaceSelector:
      matchLabels:
        name: test
    podSelector: {}
  status:
    items:
    - egressIP: 10.0.48.214
      node: ip-10-0-48-213.us-east-2.compute.internal
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""



Expected results:
EgressIP should not be assigned to the node which was removed the EgressIP label.

Additional info:

Comment 2 jechen 2022-01-08 20:36:52 UTC
confirmed similar problem on OVN cluster on GCP.  After a node is delabelled, it is still assigned egressIP

$ oc get node
NAME                                                        STATUS   ROLES    AGE   VERSION
jechen-0108c-x2qcf-master-0.c.openshift-qe.internal         Ready    master   37m   v1.22.1+6859754
jechen-0108c-x2qcf-master-1.c.openshift-qe.internal         Ready    master   37m   v1.22.1+6859754
jechen-0108c-x2qcf-master-2.c.openshift-qe.internal         Ready    master   37m   v1.22.1+6859754
jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal   Ready    worker   23m   v1.22.1+6859754
jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal   Ready    worker   23m   v1.22.1+6859754

$ oc label node jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal labeled

$ oc label node jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"=""
node/jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal labeled

$ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml
egressip.k8s.ovn.org/egressip1 created
[jechen@jechen ~]$ oc get egressip
NAME        EGRESSIPS      ASSIGNED NODE                                               ASSIGNED EGRESSIPS
egressip1   10.0.128.101   jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal   10.0.128.201


$ oc get CloudPrivateIPConfig 
NAME           AGE
10.0.128.101   40s
10.0.128.201   40s

$ oc get egressip -oyaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2022-01-08T20:17:19Z"
    generation: 3
    name: egressip1
    resourceVersion: "34822"
    uid: c3f3f4c2-ea62-4287-88b8-bf05792aa86c
  spec:
    egressIPs:
    - 10.0.128.101
    - 10.0.128.201
    namespaceSelector:
      matchLabels:
        team: red
    podSelector: {}
  status:
    items:
    - egressIP: 10.0.128.201
      node: jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal
    - egressIP: 10.0.128.101
      node: jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""


$  oc label node jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal k8s.ovn.org/egress-assignable-
node/jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal labeled

$ oc get egressip -oyaml
apiVersion: v1
items:
- apiVersion: k8s.ovn.org/v1
  kind: EgressIP
  metadata:
    creationTimestamp: "2022-01-08T20:17:19Z"
    generation: 5
    name: egressip1
    resourceVersion: "37182"
    uid: c3f3f4c2-ea62-4287-88b8-bf05792aa86c
  spec:
    egressIPs:
    - 10.0.128.101
    - 10.0.128.201
    namespaceSelector:
      matchLabels:
        team: red
    podSelector: {}
  status:
    items:
    - egressIP: 10.0.128.201
      node: jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal
    - egressIP: 10.0.128.101
      node: jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Comment 9 errata-xmlrpc 2022-03-10 16:35:04 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.