Bug 1762580
Summary: | Cannot access to the service's externalIP with egressIP from some pods in spite of ocp update to 3.11.146-1 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Min Woo Park <mpark> |
Component: | Networking | Assignee: | Juan Luis de Sousa-Valadas <jdesousa> |
Networking sub component: | openshift-sdn | QA Contact: | huirwang |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | anbhat, bbennett, cdc, dageoffr, danw, huirwang, jdesousa, jinjli, nstielau, openshift-bugs-escalate, palonsor, pamoedom, pweil, sponnaga, zzhao |
Version: | 3.11.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.6.0 | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | SDN-CUST-IMPACT | ||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
We weren't enabling conntrack for openshift SDN in multitenant mode.
Consequence:
Pods were unable to reach an externalIP services.
Fix:
Enable multitenant
Result:
Now they can reach externalIP services.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-10-27 15:54:19 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1901043 |
Comment 20
Juan Luis de Sousa-Valadas
2019-12-23 14:16:31 UTC
huiran , remembered we ever verified this kind of issue. could you also help try to if it's still can be reproduced in 3.11.146-1 Hi Min, Actually, do this instead: # iptables -N OPENSHIFT-PREROUTING -t nat # iptables -I PREROUTING -t nat -j OPENSHIFT-PREROUTING # iptables -A OPENSHIFT-PREROUTING -t nat -m mark --mark 0x1/0x1 -j RETURN # iptables -A OPENSHIFT-PREROUTING -t nat -m mark '!' --mark 0x0 -j ACCEPT If there are errors do instead: # iptables -D PREROUTING -t nat -j OPENSHIFT-PREROUTING # iptables -D OPENSHIFT-PREROUTING -t nat -m mark --mark 0x1/0x1 -j RETURN # iptables -D OPENSHIFT-PREROUTING -t nat -m mark '!' --mark 0x0 -j ACCEPT # iptables -X OPENSHIFT-PREROUTING -t nat Only necessary on the nodes with an egress IP, but may be done on every node. Updating the release to track the development branch. Juan is actively working on the issue, and we will work out the backport as soon as we have a tested fix. Hi Huiran, These are the steps to reproduce. Deploy a cluster with the *multitenant* plugin. For both RHCOS and RHEL have four nodes which look like: NAME HOST HOST IP SUBNET EGRESS CIDRS EGRESS IPS node-0 node-0 136.144.52.250 10.130.2.0/23 [136.144.52.242] node-1 node-1 136.144.52.230 10.131.2.0/23 node-2 node-2 136.144.52.243 10.129.2.0/23 node-3 node-3 136.144.52.241 10.128.2.0/23 For this scenario node-0 has the egressIP, node-1 has the externalIP, node-2 has the client and rhel-3 has the server $ oc get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-service1 ClusterIP 172.30.77.123 136.144.52.230 27018/TCP 8h The externalIP is *the host ip the hostsubnet*. You could use a different external provided that said IP belongs to the node node-1 and that the rest of nodes can reach it. The client runs on node-2 in the test-client project which uses the egressIP and server runs on node-3 which on the test-server project which may or may not have an egressIP. Client and server must *not* be joined (as in "oc adm pod-network join-projects") The test *must* be run on both all RHCOS and all RHEL nodes. If we do all RHEL and all RHCOS we cover all possible scenarios. The test cannot be performed on a single host because then we may find an edge case where it works, if it works accross 4 different hosts that's the worst case scenario, and should cover all the different scenario. I have verified this manually but needs to be validated again for both RHCOS and RHEL again by QA as soon as the PR merges. Thanks again Huiran for the multiple clusters. *** Bug 1717487 has been marked as a duplicate of this bug. *** Hi Huiran, I belive this is related to RHCOS vs RHEL 7 behavior. Could you deploy a cluster with 4 RHEL nodes: one with the egressIP, one with the externalIP, one with the client pod and one with the server pod? I think this may work on RHEL 7 because in my local 3.11 fork this works, so I think it may be because of RHEL, I can't think of any other difference in SDN between 3.11 or 4.6 that may cause this. Hi Huiran, Because this works in RHEL 7, doesn't work on RHEL 8 and the customer with this use case is using OCP 3.11 which is RHEL 7 only, I think we should mark this bug as VERIFIED so that we can backport it all the way back to OCP 3.11, and file a new low priority bug specifically for the RHCOS case. Do you think QA could agree with that? *** Bug 1881882 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4196 |