Bug 1877273
| Summary: | [OVN] EgressIP cannot fail over to available nodes after one egressIP node shutdown | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | huirwang | |
| Component: | Networking | Assignee: | Alexander Constantinescu <aconstan> | |
| Networking sub component: | ovn-kubernetes | QA Contact: | huirwang | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | high | CC: | aconstan, acossett, amulmule, bbennett, ChetRHosey, danw, jboxman, jnordell, skanakal, vpickard | |
| Version: | 4.6 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.7.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | Bug Fix | ||
| Doc Text: |
Cause:
When a node experienced networking issues (or the kubelet failed to function properly and went into a non-ready state) the egress IPs assigned to that node were never re-assigned elsewhere
Consequence:
The egress IP functionality was broken as packets were still routed to this faulty egress node, which could not serve traffic.
Fix:
We now verify the state of all egress nodes periodically by pinging each egress node and verifying the node object's state.
Result:
In case a node goes down, the egress IPs are now re-assigned and the functionality keeps working by re-directing egress traffic to another node.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1898160 (view as bug list) | Environment: | ||
| Last Closed: | 2021-02-24 15:17:43 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1898160 | |||
Hi Huiran I am going to push this out to 4.7, the reason is: 1) In OVN/OVS we cannot have multiple reroutes matching the same traffic to multiple egress nodes. For this we would need the OVN RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1881826 to be implemented. 2) Even if multiple reroutes to multiple egress nodes exists, we cannot ensure that if a node silently dies (i.e the OpenShift/Kubernetes API server is not aware) that traffic then flows though the node which is still functioning, for that we will need OVN RFE: https://bugzilla.redhat.com/show_bug.cgi?id=1847570 This is not a big use case I believe and should be fine waiting for until the 4.7 release. This is really important to fix as soon as possible, since the customer can not go to production without a failover working scenario, the application will be down until this faulty node is destroy. Also it does not make sens to configure 2 IP, if the failover is not working. This is a bid use case for Telco and Financial customer. It looks like we sort of covered this with: "If a node is deleted by a cluster administrator, any egress IP addresses assigned to it are automatically reassigned, subject to the previously described conditions." But it isn't working? Is this a known issue for OCP 4.6 GA? Thanks! (In reply to Alexander Constantinescu from comment #5) > This is not a big use case I believe and should be fine waiting for until > the 4.7 release. Doh. So I guess there was confusion in all the scurrying to finish 4.6 features, but this is absolutely a mandatory part of the feature. OVN-Kubernetes needs to actively detect when nodes become unreachable, and move their egress IPs away when they do. It can't just assume Nodes will get deleted if they are unavailable. See poll()/check() in https://github.com/openshift/sdn/blob/master/pkg/network/master/egressip.go for the openshift-sdn version. OVN-Kubernetes also needs code to rebalance egress IPs when they get too unbalanced. (eg, once the above problem is fixed, then after an upgrade, the last egress node to reboot would end up with 0 egress IPs assigned afterward). What OpenShift SDN does (ReallocateEgressIPs() in https://github.com/openshift/sdn/blob/master/pkg/network/common/egressip.go) is that every time a node or an egress IP is added or removed, it computes both an "incremental" allocation (like what ovn-kubernetes does now) and a "from scratch" allocation (ie, how it would have chosen to allocate the IPs if none of them were already assigned). And then if any node has more than twice as many egress IPs in the "from scratch" allocation as it would have had in the "incremental" allocation, it knows things have gotten unbalanced and it needs to proactively move some IPs over to the underallocated node(s). This is ready for testing. It's been integrated on master (i.e 4.7 with PR: https://github.com/openshift/ovn-kubernetes/pull/317) so I am setting it to MODIFIED. I am working on the back-port to 4.6 So does this need a docs update once it is merged? Thanks! Docs update for this BZ: https://github.com/openshift/openshift-docs/pull/28956 Is this okay? Thanks! With the lastest code change, why does BOTH IP are reassigned to new nodes, when only on fails ?
Initial working flow
Active Traffic flow from the egress pod attached to this egressIP
POD to External is natted to 10.0.32.112(all good working) (view starting config)
-----
Step for failover is to shutdown the node ovn-qgwkn-worker-canadacentral3-k4qk5 = 10.0.32.112 and expect the traffic to than be exited with 10.0.32.111 (as per initial starting config)
Results :
IP 111 is re-assigned to another node automatically and traffic flow is interrupted, and you can see now that both IP are not matching the right node anymore...
Result Config after the node shutdown
~/Documents/ocp4/ovn_egressip » oc get egressIP
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip-test 10.0.32.111 ovn-qgwkn-worker-canadacentral2-rhswn 10.0.32.111
status:
items:
- egressIP: 10.0.32.111
node: ovn-qgwkn-worker-canadacentral2-rhswn
- egressIP: 10.0.32.112
node: ovn-qgwkn-worker-canadacentral1-bskbf
--------------------------------------------------------------------
Starting Config
~/Documents/ocp4/ovn_egressip » oc get egressIP
NAME EGRESSIPS ASSIGNED NODE ASSIGNED EGRESSIPS
egressip-test 10.0.32.111 ovn-qgwkn-worker-canadacentral1-bskbf 10.0.32.111
Node #1: ovn-qgwkn-worker-canadacentral1-bskbf = 10.0.32.111
Node #2: ovn-qgwkn-worker-canadacentral3-k4qk5 = 10.0.32.112
egressIP yaml :
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
name: egressip-test
spec:
egressIPs:
- 10.0.32.111
- 10.0.32.112
namespaceSelector:
matchLabels:
name: example-egressip1
status:
items:
- egressIP: 10.0.32.111
node: ovn-qgwkn-worker-canadacentral1-bskbf
- egressIP: 10.0.32.112
node: ovn-qgwkn-worker-canadacentral3-k4qk5
Expectation (manual mode)
#1 10.0.32.111 should not be reassigned (untouched) and Pod traffic should start exiting with this IP.
#2 10.0.32.112 should be inactive until the node comes back or reassigned after +- 5 minutes of inactivities ?
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |
Description of problem: There are two EgressIP nodes configured with EgressIP, currently pods using one egressIP node for outgoing traffic, but once that node shutdown, the pods outgoing traffic was broken, not fail over to another node. Version-Release number of selected component (if applicable): 4.6.0-0.ci-2020-09-08-214738 How reproducible: Always Steps to Reproduce: 1. Label two nodes with egressIP oc label node compute-1 "k8s.ovn.org/egress-assignable"="" oc label node compute-0 "k8s.ovn.org/egress-assignable"="" 2.Create namespace test and pods in it. Add labels to namespaces and pods oc label ns test name=test oc label pod hello-pod team=blue -n test 3. Create egressIP object apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: name: egressip7 spec: egressIPs: - 139.178.76.20 - 139.178.76.21 podSelector: matchLabels: team: blue namespaceSelector: matchLabels: name: test 4. Check the egressIP object, the egressIP applied two nodes. oc get egressip egressip7 -o yaml apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: creationTimestamp: "2020-09-09T05:43:42Z" generation: 2 managedFields: - apiVersion: k8s.ovn.org/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:egressIPs: {} f:namespaceSelector: .: {} f:matchLabels: .: {} f:name: {} f:podSelector: .: {} f:matchLabels: .: {} f:team: {} manager: oc operation: Update time: "2020-09-09T05:43:42Z" - apiVersion: k8s.ovn.org/v1 fieldsType: FieldsV1 fieldsV1: f:status: .: {} f:items: {} manager: ovnkube operation: Update time: "2020-09-09T05:43:43Z" name: egressip7 resourceVersion: "185731" selfLink: /apis/k8s.ovn.org/v1/egressips/egressip7 uid: 640e20aa-2acd-45fa-be08-e83b73905ea4 spec: egressIPs: - 139.178.76.20 - 139.178.76.21 namespaceSelector: matchLabels: name: test podSelector: matchLabels: team: blue status: items: - egressIP: 139.178.76.20 node: compute-0 - egressIP: 139.178.76.21 node: compute-1 5. From test pod to access outside websites, it used the first egressIp configured as source IP. oc rsh -n test hello-pod / # curl ifconfig.me 139.178.76.20 6. Shutdown node compute-0 oc get nodes NAME STATUS ROLES AGE VERSION compute-0 NotReady worker 6h52m v1.19.0-rc.2+068702d compute-1 Ready worker 6h52m v1.19.0-rc.2+068702d control-plane-0 Ready master 7h1m v1.19.0-rc.2+068702d control-plane-1 Ready master 7h1m v1.19.0-rc.2+068702d control-plane-2 Ready master 7h1m v1.19.0-rc.2+068702d 6. From test pod to access outside websites oc rsh -n test hello-pod / # curl --connect-timeout 5 ifconfig.me curl: (7) Failed to connect to ifconfig.me port 80: Operation timed out Actual results: It cannot connect outside after the node compute-0 shutdown Expected results: The pod should also connect outside the cluster and fail over to another available egressIP. Additional info: