Bug 1996108
Summary: | Allow backwards compatibility of shared gateway mode to inject host-based routes into OVN | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Tim Rozet <trozet> | |
Component: | Networking | Assignee: | Tim Rozet <trozet> | |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | high | |||
Priority: | high | CC: | akaris, alosadag, bschmaus, bzhai, djuran, fbaudin, fherrman, fminafra, jseunghw, kkarampo, lars, mapandey, mavazque, mmethot, rcegan, rcernin, sasun, shishika, skanakal, surya, vfarias | |
Version: | 4.8 | Flags: | skanakal:
needinfo-
|
|
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | Bug Fix | ||
Doc Text: |
Cause:
In 4.8, the gateway mode for OVN-Kubernetes deployments moved from "local" gateway mode to "shared" gateway mode. These modes affect ingress and egress traffic into a Kubernetes node. With local, all traffic is routed to the host kernel networking stack before egressing/ingressing a cluster. In other words, the host routing table and iptables are evaluated on ingress/egress packets before either entering OVN, or being sent to the next hop outside of the node, respectively. In shared gateway mode, ingress/egress traffic to/from OVN networked pods bypass the kernel routing table and are sent directly out of the NIC via OVS. The advantages of this include better performance, ability to hardware offload, and less SNAT'ing. One of the disadvantages of shared (bypassing the kernel) is that a user's custom routes/iptables rules are not respected for egress traffic.
Consequence:
Users with custom host routing/iptables rules that upgrade to a version 4.8 or newer may have unintended egress routing due to the fact that packets are bypassing the host kernel. There is currently no support to configure the equivalent routes inside of OVN.
Fix:
Allow users to configure the gateway mode so that users who depended on custom routing rules in the kernel to steer traffic will continue to have this desired behavior. Note: migrating gateway modes in a cluster may result in some temporary ingress/egress traffic outage.
Result:
In releases 4.8 and 4.9, a config map needs to be created that will signal to cluster network operator (CNO) to switch the gateway mode:
apiVersion: v1
kind: ConfigMap
metadata:
name: gateway-mode-config
namespace: openshift-network-operator
data:
mode: "local"
immutable: true
For releases > 4.10, a new API is exposed called "routingViaHost". By setting this config in CNO, traffic will be routed to the kernel before egressing the node:
spec:
defaultNetwork:
type: OVNKubernetes
ovnKubernetesConfig:
mtu: 1400
genevePort: 6081
gatewayConfig:
routingViaHost: true
Workaround:
For users who are on 4.8 or 4.9 versions in *shared* gw mode without the fixed versions of 4.8 and 4.9, they may attempt to use the config map previously mentioned. However, after doing this service traffic and egress firewall may no longer function correctly. In order to fix service traffic a route needs to be manually deleted on each node matching on the service CIDR. For example, assume a service CIDR of 10.96.0.0/16:
1. On shared gateway mode, there will be a route towards the br-ex interface like:
[root@ovn-worker ~]# ip route show 10.96.0.0/16
10.96.0.0/16 via 172.18.0.1 dev br-ex mtu 1400
2. In local gateway mode for versions < 4.10, this route needs to point to ovn-k8s-mp0 interface. Manually remove the shared gateway route on each node:
[root@ovn-worker ~]# ip route del 10.96.0.0/16
[root@ovn-worker ~]# ip route show 10.96.0.0/16
[root@ovn-worker ~]#
3. Restart ovnkube-node. ovnkube-node will now re-add the correct route which should point towards ovn-k8s-mp0. For example:
[root@ovn-worker ~]# ip route show 10.96.0.0/16
10.96.0.0/16 via 10.244.1.1 dev ovn-k8s-mp0
Note, there is no current workaround to make egress firewall work. Users should upgrade to a fixed version to ensure egress firewall works.
|
Story Points: | --- | |
Clone Of: | ||||
: | 2036977 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:05:43 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 2000007, 2036977 |
Description
Tim Rozet
2021-08-20 14:58:13 UTC
A potential solution here is to provide a config flag where --force-host-network, where the following would happen: 1. route policies on OVN DVR for all nodes added to redirect to mp0 (forcing the traffic into the host to be routed from OVN pods) 2. modify load balancers on GR (host -> service traffic): a. host-> service traffic still goes via br-ex b. only hairpin host network endpoint is rerouted back to the shared gw bridge and eventually the host. c. other host endpoints would need to be DNAT'ed to their host endpoints, then routes added to GR to force traffic towards the DVR (snat'ed to join subnet, 100.64.0.x) d. route policy in DVR (set from 1), force the traffic to mp0, and the host routes the traffic, and snats e. would need a route on the host for return traffic for join subnet to go back into mp0 Now that I have a better understanding of the problem ignore comment 1 solution...and I've updated the description. A potential workaround is to add custom routes to each OVN gateway router (GR). This needs to be done manually on each node. In order to add a custom route to each GR: 1. exec into the leader nbdb container for the ovnkube-master pod. You may need to try each one to find the leader. Inside the container you can issue ovn-nbctl show. If this command succeeds you are in the leader. 2. check the current routes in your node's gateway router. The name of the gateway router is always "GR_<node name>". For example, if my node name is "ovn-worker": [root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker IPv4 Routes 10.244.0.0/16 100.64.0.1 dst-ip 0.0.0.0/0 172.18.0.1 dst-ip rtoe-GR_ovn-worker 3. add your new route: [root@ovn-control-plane ~]# ovn-nbctl lr-route-add GR_ovn-worker 192.168.1.0/24 192.168.0.3 [root@ovn-control-plane ~]# ovn-nbctl lr-route-list GR_ovn-worker IPv4 Routes 192.168.1.0/24 192.168.0.3 dst-ip 10.244.0.0/16 100.64.0.1 dst-ip 0.0.0.0/0 172.18.0.1 dst-ip rtoe-GR_ovn-worker There is no CRD or any API way exposed today to add custom routes into OVN. This seems like a critical problem: how are we supposed to access services on networks to which the host is directly attached? The workaround suggested here doesn't seem appropriate for directly connected host routes. We would need the OVN equivalent of:
ip route add 10.253.0.0/23 dev vlan210
The lr-route-add command doesn't seem like it will work because it requires a `nexthop` argument, but there won't be one for directly attached networks.
> One other possible solution is to have ovnkube read the host routing table for the subnet attached to br-ex, and automatically add those routes to each GR. I'm not sure we want to support this or not.
I don't think this would be sufficient: you don't want the routing table "for the subnet attached to br-ex"; you'd the actual host routing table. Otherwise you still wouldn't have access to directly attached networks.
Tim has suggested via slack that we might be able to work around this with a route policy, such as:
ovn-nbctl lr-policy-add ovn_cluster_router 1004 \
'inport == "rtos-oct-03-26-compute" && ip4.dst == 10.253.0.0/23' \
reroute 10.130.0.2
This worked partially: we saw traffic egressing on ovn-k8s-mp0, which is what we wanted to see, but it didn't appear to leave the host. This also has the disadvantages that (a) it requires knowledge of the host IP on the target network, which ideally wouldn't be necessary, and (b) it requires one rule per/host, which means we're stuck with manual work whenever we add a node.
David Guthrie has suggested adding the target network as an `additionalNetworks` configuration in the SDN config, so that pods get addresses directly on the target network. Does that seem like a viable solution? I guess it would solve the routing problem, but we would have to pay closer to attention to IP address availability if we were consuming one/container when we only need one/host.
We were able to get the connectivity we wanted by implementing the routing policy that Tim suggested... ovn-nbctl lr-policy-add ovn_cluster_router 1004 \ 'inport == "rtos-oct-03-26-compute" && ip4.dst == 10.253.0.0/23' \ reroute 10.130.0.2 ...and then adding a NAT rule on the nodes: iptables -t nat -I POSTROUTING 1 -s 10.128.0.0/14 -d 10.253.0.0/23 -j MASQUERADE (Where 10.128.0.0/14 is the cluster network) We've opted to try switching to OpenShift-SDN instead because that ultimately seems like a simpler solution, since the above changes require manual operation on the OVN controllers for every node (and every new node as we add them), combined with a machineconfig to set up the iptables rules on the workers, and would probably result in supportability questions at some point. After some discussion we will support the previous functionality of routing all egress traffic via the kernel. This will allow the previous behavior to continue working. This mode of ovn-kubernetes is called "local gateway" mode, while the default mode in 4.8 and later is called "shared gateway" mode. Local gateway mode still exists in 4.8 and later, it is just only enabled right now via a hidden configuration. As a workaround for now, I'll provide the instructions for enabling this mode. However, we will come up with a proper configuration knob exposed via cluster network config to switch between gateway modes. Note, for now we only have validated that migrating from local gateway mode -> shared gateway mode works, and not the reverse. We will validate/fix this though. If a customer is relying on custom routes/iptables rules to steer egress traffic, it is advised to stay on local gateway mode when upgrading from 4.7->4.8->4.9. To do this: So to deploy 4.8 or later with local gateway mode, you need to create a config-map to indicate that gateway mode and have it present at deploy/upgrade time. In order to do this for a fresh install: 1. put your install-config.yaml in the your <install folder> 2. openshift-install create-manifests --dir=<install folder> 3. create a file like this in the newly created manifests dir: apiVersion: v1 kind: ConfigMap metadata: name: gateway-mode-config namespace: openshift-network-operator data: mode: "local" immutable: true 4. openshift-install create cluster --dir=<install folder> I'll keep this bug open to address any issues with switching from shared gateway back to local gateway mode. *** Bug 2000007 has been marked as a duplicate of this bug. *** Hello, my customer has already OCP 4.8 deployed (with default mode "shared gateway"). You only provided instructions on how to enable "local gateway" mode at installation time. Is it possible to change configuration for a cluster already installed? What would be the change? Just adding the missing configmap in openshift-network-operator namespace? Hi, We have just faced the problem with the default gateway mode set to shared as default in 4.8. The use case for the customer is setting up external ODF (ceph) cluster. All nodes are connected to the ceph external cluster network via a secondary interface which is not the br-ex one. Then we have set up a couple of routes in the host network routing table via nmstate configs. That way it was working OK in 4.6 EUS, however, we noticed this change in 4.8 and we needed to switch back to local gateway mode. I think this is a valid use case since it makes sense to avoid mixing cluster traffic from storage traffic in the same br-ex interface. Customer would like to know if it is expected to be able to apply policies or rules can be implemented directly into OVN nbdb so we can keep shared gateway mode configured (since it is the default configuration and probably more efficient than local gateway mode). @trozet is it possible to switch the gateway mode of an existing cluster (absent either an upgrade or a fresh install)? We're hitting this on a second cluster that was upgraded from 4.6 -> 4.8 and suddenly stopped working. I can roll back to OpenShiftSDN, but I would be interested in trying to modify the gateway mode instead of that's possible. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |