Description of problem: There are two EgressIP nodes configured with EgressIP, currently pods using one egressIP node for outgoing traffic, but once that node shutdown, the pods outgoing traffic was broken, not fail over to another node. Version-Release number of selected component (if applicable): 4.6.0-0.ci-2020-09-08-214738 How reproducible: Always Steps to Reproduce: 1. Label two nodes with egressIP oc label node compute-1 "k8s.ovn.org/egress-assignable"="" oc label node compute-0 "k8s.ovn.org/egress-assignable"="" 2.Create namespace test and pods in it. Add labels to namespaces and pods oc label ns test name=test oc label pod hello-pod team=blue -n test 3. Create egressIP object apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip7 spec: egressIPs: - 139.178.76.20 - 139.178.76.21 podSelector: matchLabels: team: blue namespaceSelector: matchLabels: name: test 4. Check the egressIP object, the egressIP applied two nodes. oc get egressip egressip7 -o yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2020-09-09T05:43:42Z" generation: 2 managedFields: - apiVersion: k8s.ovn.org/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:egressIPs: {} f:namespaceSelector: .: {} f:matchLabels: .: {} f:name: {} f:podSelector: .: {} f:matchLabels: .: {} f:team: {} manager: oc operation: Update time: "2020-09-09T05:43:42Z" - apiVersion: k8s.ovn.org/v1 fieldsType: FieldsV1 fieldsV1: f:status: .: {} f:items: {} manager: ovnkube operation: Update time: "2020-09-09T05:43:43Z" name: egressip7 resourceVersion: "185731" selfLink: /apis/k8s.ovn.org/v1/egressips/egressip7 uid: 640e20aa-2acd-45fa-be08-e83b73905ea4 spec: egressIPs: - 139.178.76.20 - 139.178.76.21 namespaceSelector: matchLabels: name: test podSelector: matchLabels: team: blue status: items: - egressIP: 139.178.76.20 node: compute-0 - egressIP: 139.178.76.21 node: compute-1 5. From test pod to access outside websites, it used the first egressIp configured as source IP. oc rsh -n test hello-pod / # curl ifconfig.me 139.178.76.20 6. Shutdown node compute-0 oc get nodes NAME STATUS ROLES AGE VERSION compute-0 NotReady worker 6h52m v1.19.0-rc.2+068702d compute-1 Ready worker 6h52m v1.19.0-rc.2+068702d control-plane-0 Ready master 7h1m v1.19.0-rc.2+068702d control-plane-1 Ready master 7h1m v1.19.0-rc.2+068702d control-plane-2 Ready master 7h1m v1.19.0-rc.2+068702d 6. From test pod to access outside websites oc rsh -n test hello-pod / # curl --connect-timeout 5 ifconfig.me curl: (7) Failed to connect to ifconfig.me port 80: Operation timed out Actual results: It cannot connect outside after the node compute-0 shutdown Expected results: The pod should also connect outside the cluster and fail over to another available egressIP. Additional info:
Hi Huiran I am going to push this out to 4.7, the reason is: 1) In OVN/OVS we cannot have multiple reroutes matching the same traffic to multiple egress nodes. For this we would need the OVN RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1881826 to be implemented. 2) Even if multiple reroutes to multiple egress nodes exists, we cannot ensure that if a node silently dies (i.e the OpenShift/Kubernetes API server is not aware) that traffic then flows though the node which is still functioning, for that we will need OVN RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1847570 This is not a big use case I believe and should be fine waiting for until the 4.7 release.
This is really important to fix as soon as possible, since the customer can not go to production without a failover working scenario, the application will be down until this faulty node is destroy. Also it does not make sens to configure 2 IP, if the failover is not working. This is a bid use case for Telco and Financial customer.
It looks like we sort of covered this with: "If a node is deleted by a cluster administrator, any egress IP addresses assigned to it are automatically reassigned, subject to the previously described conditions." But it isn't working? Is this a known issue for OCP 4.6 GA? Thanks!
(In reply to Alexander Constantinescu from comment #5) > This is not a big use case I believe and should be fine waiting for until > the 4.7 release. Doh. So I guess there was confusion in all the scurrying to finish 4.6 features, but this is absolutely a mandatory part of the feature. OVN-Kubernetes needs to actively detect when nodes become unreachable, and move their egress IPs away when they do. It can't just assume Nodes will get deleted if they are unavailable. See poll()/check() in https://github.com/openshift/sdn/blob/master/pkg/network/master/egressip.go for the openshift-sdn version. OVN-Kubernetes also needs code to rebalance egress IPs when they get too unbalanced. (eg, once the above problem is fixed, then after an upgrade, the last egress node to reboot would end up with 0 egress IPs assigned afterward). What OpenShift SDN does (ReallocateEgressIPs() in https://github.com/openshift/sdn/blob/master/pkg/network/common/egressip.go) is that every time a node or an egress IP is added or removed, it computes both an "incremental" allocation (like what ovn-kubernetes does now) and a "from scratch" allocation (ie, how it would have chosen to allocate the IPs if none of them were already assigned). And then if any node has more than twice as many egress IPs in the "from scratch" allocation as it would have had in the "incremental" allocation, it knows things have gotten unbalanced and it needs to proactively move some IPs over to the underallocated node(s).
This is ready for testing. It's been integrated on master (i.e 4.7 with PR: https://github.com/openshift/ovn-kubernetes/pull/317) so I am setting it to MODIFIED. I am working on the back-port to 4.6
So does this need a docs update once it is merged? Thanks!
Docs update for this BZ: https://github.com/openshift/openshift-docs/pull/28956 Is this okay? Thanks!
With the lastest code change, why does BOTH IP are reassigned to new nodes, when only on fails ? Initial working flow Active Traffic flow from the egress pod attached to this egressIP POD to External is natted to 10.0.32.112(all good working) (view starting config) ----- Step for failover is to shutdown the node ovn-qgwkn-worker-canadacentral3-k4qk5 = 10.0.32.112 and expect the traffic to than be exited with 10.0.32.111 (as per initial starting config) Results : IP 111 is re-assigned to another node automatically and traffic flow is interrupted, and you can see now that both IP are not matching the right node anymore... Result Config after the node shutdown ~/Documents/ocp4/ovn_egressip » oc get egressIP NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-test 10.0.32.111 ovn-qgwkn-worker-canadacentral2-rhswn 10.0.32.111 status: items: - egressIP: 10.0.32.111 node: ovn-qgwkn-worker-canadacentral2-rhswn - egressIP: 10.0.32.112 node: ovn-qgwkn-worker-canadacentral1-bskbf -------------------------------------------------------------------- Starting Config ~/Documents/ocp4/ovn_egressip » oc get egressIP NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS egressip-test 10.0.32.111 ovn-qgwkn-worker-canadacentral1-bskbf 10.0.32.111 Node #1: ovn-qgwkn-worker-canadacentral1-bskbf = 10.0.32.111 Node #2: ovn-qgwkn-worker-canadacentral3-k4qk5 = 10.0.32.112 egressIP yaml : apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip-test spec: egressIPs: - 10.0.32.111 - 10.0.32.112 namespaceSelector: matchLabels: name: example-egressip1 status: items: - egressIP: 10.0.32.111 node: ovn-qgwkn-worker-canadacentral1-bskbf - egressIP: 10.0.32.112 node: ovn-qgwkn-worker-canadacentral3-k4qk5 Expectation (manual mode) #1 10.0.32.111 should not be reassigned (untouched) and Pod traffic should start exiting with this IP. #2 10.0.32.112 should be inactive until the node comes back or reassigned after +- 5 minutes of inactivities ?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633