Description of problem: [OVN AWS] EgressIP was assigned to the node which is not egress node anymore. Version-Release number of selected component (if applicable): 4.10.0-0.ci-2021-12-19-184945 How reproducible: Steps to Reproduce: 1. Label two nodes as egress nodes and then create one EgressIP object contains two EgressIPs $ oc get egressip -o yaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2021-12-20T02:49:46Z" generation: 6 name: egressip-1 resourceVersion: "57713" uid: 712c04a6-ae7f-4053-81c4-c9c192b85f4d spec: egressIPs: - 10.0.48.214 - 10.0.48.215 namespaceSelector: matchLabels: name: test podSelector: {} status: items: - egressIP: 10.0.48.214 node: ip-10-0-48-213.us-east-2.compute.internal - egressIP: 10.0.48.215 node: ip-10-0-51-53.us-east-2.compute.internal kind: List metadata: resourceVersion: "" selfLink: "" 2. Remove egress label from one node. $ oc label node ip-10-0-48-213.us-east-2.compute.internal k8s.ovn.org/egress-assignable- node/ip-10-0-48-213.us-east-2.compute.internal labeled 3. Check egressip object $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-1 10.0.48.214 ip-10-0-51-53.us-east-2.compute.internal 10.0.48.215 4. Add a iptable rule on node ip-10-0-51-53.us-east-2.compute.internal to deny traffic targeting to port 9 $ oc debug node/ip-10-0-51-53.us-east-2.compute.internal Starting pod/ip-10-0-51-53us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.51.53 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# iptables -A INPUT -p tcp --destination-port 9 -j DROP sh-4.4# 5. Check egressip object status $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-1 10.0.48.214 6. Remove the new added iptable rule $ oc debug node/ip-10-0-51-53.us-east-2.compute.internal Starting pod/ip-10-0-51-53us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10.0.51.53 If you don't see a command prompt, try pressing enter. sh-4.4# chroot /host sh-4.4# iptables -D INPUT 2 sh-4.4# iptables -L INPUT --line-numbers Chain INPUT (policy ACCEPT) num target prot opt source destination 1 KUBE-FIREWALL all -- anywhere anywhere sh-4.4# 7. $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-1 10.0.48.214 8. Restart ovnkube-master pods $ oc get pods -n openshift-ovn-kubernetes huiran-mac:script hrwang$ oc delete pods -n openshift-ovn-kubernetes -l app=ovnkube-master pod "ovnkube-master-cn9ql" deleted pod "ovnkube-master-m44tg" deleted pod "ovnkube-master-mx7sw" deleted 9. Check EgressIP object again. 10. Also double check the nodes labels oc get node ip-10-0-48-213.us-east-2.compute.internal --show-labels NAME STATUS ROLES AGE VERSION LABELS ip-10-0-48-213.us-east-2.compute.internal Ready master 113m v1.22.1+6859754 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-48-213.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a $ oc get node ip-10-0-48-213.us-east-2.compute.internal --show-labels | grep egress-assignable= $ $ oc get node ip-10-0-51-53.us-east-2.compute.internal --show-labels | grep egress-assignable= ip-10-0-51-53.us-east-2.compute.internal Ready worker 3h8m v1.22.1+6859754 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,k8s.ovn.org/egress-assignable=,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-51-53.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a Actual results: The egressip was assigned to node ip-10-0-48-213.us-east-2.compute.internal , but it was already removed the EgressIP label. $ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-1 10.0.48.214 ip-10-0-48-213.us-east-2.compute.internal 10.0.48.214 oc get egressip -o yaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2021-12-20T02:49:46Z" generation: 9 name: egressip-1 resourceVersion: "60866" uid: 712c04a6-ae7f-4053-81c4-c9c192b85f4d spec: egressIPs: - 10.0.48.214 - 10.0.48.215 namespaceSelector: matchLabels: name: test podSelector: {} status: items: - egressIP: 10.0.48.214 node: ip-10-0-48-213.us-east-2.compute.internal kind: List metadata: resourceVersion: "" selfLink: "" Expected results: EgressIP should not be assigned to the node which was removed the EgressIP label. Additional info:
confirmed similar problem on OVN cluster on GCP. After a node is delabelled, it is still assigned egressIP $ oc get node NAME STATUS ROLES AGE VERSION jechen-0108c-x2qcf-master-0.c.openshift-qe.internal Ready master 37m v1.22.1+6859754 jechen-0108c-x2qcf-master-1.c.openshift-qe.internal Ready master 37m v1.22.1+6859754 jechen-0108c-x2qcf-master-2.c.openshift-qe.internal Ready master 37m v1.22.1+6859754 jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal Ready worker 23m v1.22.1+6859754 jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal Ready worker 23m v1.22.1+6859754 $ oc label node jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal labeled $ oc label node jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal "k8s.ovn.org/egress-assignable"="" node/jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal labeled $ oc create -f ./SDN-1332-test/config_egressip1_ovn_ns_team_red.yaml egressip.k8s.ovn.org/egressip1 created [jechen@jechen ~]$ oc get egressip NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip1 10.0.128.101 jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal 10.0.128.201 $ oc get CloudPrivateIPConfig NAME AGE 10.0.128.101 40s 10.0.128.201 40s $ oc get egressip -oyaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2022-01-08T20:17:19Z" generation: 3 name: egressip1 resourceVersion: "34822" uid: c3f3f4c2-ea62-4287-88b8-bf05792aa86c spec: egressIPs: - 10.0.128.101 - 10.0.128.201 namespaceSelector: matchLabels: team: red podSelector: {} status: items: - egressIP: 10.0.128.201 node: jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal - egressIP: 10.0.128.101 node: jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal kind: List metadata: resourceVersion: "" selfLink: "" $ oc label node jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal k8s.ovn.org/egress-assignable- node/jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal labeled $ oc get egressip -oyaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2022-01-08T20:17:19Z" generation: 5 name: egressip1 resourceVersion: "37182" uid: c3f3f4c2-ea62-4287-88b8-bf05792aa86c spec: egressIPs: - 10.0.128.101 - 10.0.128.201 namespaceSelector: matchLabels: team: red podSelector: {} status: items: - egressIP: 10.0.128.201 node: jechen-0108c-x2qcf-worker-b-54t7f.c.openshift-qe.internal - egressIP: 10.0.128.101 node: jechen-0108c-x2qcf-worker-a-9xpdm.c.openshift-qe.internal kind: List metadata: resourceVersion: "" selfLink: ""
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056