Bug 2060159
| Summary: | LGW: External->Service of type ETP=Cluster doesn't go to the node | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | elevin | ||||
| Component: | Networking | Assignee: | Surya Seetharaman <surya> | ||||
| Networking sub component: | ovn-kubernetes | QA Contact: | elevin | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | high | ||||||
| Priority: | high | CC: | cgoncalves, djuran, fbaudin, fherrman, fpaoline, mmahmoud, surya, wking | ||||
| Version: | 4.10 | Keywords: | Regression, Upgrades | ||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.11.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Known Issue | |||||
| Doc Text: |
Cause: Host routes are ignored when load balancer type Services are configured with "Cluster" traffic policy.
Consequence: Egress traffic of load balancer type Services is always steered to the default gateway, regardless of whether there is a better matching route on the host routing table.
Workaround (if any): Set load balancer type Services to "Local" traffic policy
Result:
|
Story Points: | --- | ||||
| Clone Of: | |||||||
| : | 2073137 (view as bug list) | Environment: | |||||
| Last Closed: | 2022-08-10 10:51:53 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | |||||||
| Bug Blocks: | 2073137 | ||||||
| Attachments: |
|
||||||
The topology is as follows: The node is connected to multiple routers, one of them is also the default gateway. The traffic is coming from a router that is not the default gateway. Routes were added to the node, to steer the traffic back to the client. What happens is, the traffic comes into the node, but the reply is sent to the default gateway. With a service with Traffic Policy = Cluster, the routes are ignored because all the traffic happens inside br-ex / ovn. With a service with Traffic Policy = Local, the traffic is dropped on the host and it works. My understanding is that such configuration was working until 4.10 Idea is to see if we can steer the traffic into the host for services in lgw, let it hit the routes use the .2 IP to take it back to br-ex and reply should come back the same way (hopefully). Testing this. Upstream PR merged, downstream PR on its way to merge. Will be backported to 4.10.z PR merged. Verified:
Client Version: 4.11.0-0.nightly-2022-04-26-030643
Kustomize Version: v4.5.4
Server Version: 4.11.0-0.nightly-2022-04-26-030643
Kubernetes Version: v1.23.3+d464c70
metallb-operator.4.11.0-202203281806
====================================================
apiVersion: v1
kind: Service
metadata:
creationTimestamp: "2022-04-27T15:27:09Z"
name: hello-world
namespace: arti-test
resourceVersion: "1055625"
uid: 11f8a8f6-42aa-45a9-9208-cb3081af95f7
spec:
allocateLoadBalancerNodePorts: true
clusterIP: 172.30.75.111
clusterIPs:
- 172.30.75.111
externalTrafficPolicy: Cluster
internalTrafficPolicy: Cluster
ipFamilies:
- IPv4
ipFamilyPolicy: SingleStack
ports:
- nodePort: 30640
port: 80
protocol: TCP
targetPort: 8080
selector:
app: hello-world
sessionAffinity: None
type: LoadBalancer
status:
loadBalancer:
ingress:
- ip: 10.10.10.10
bash-5.1# wget -qO- 10.10.10.10
Hello Kubernetes!
13:06:13.985017 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [S], seq 300684042, win 29200, options [mss 1460,sackOK,TS val 3961739875 ecr 0,nop,wscale 7], length 0
13:06:13.988380 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [S.], seq 96525721, ack 300684043, win 26960, options [mss 1360,sackOK,TS val 3371233124 ecr 3961739875,nop,wscale 7], length 0
13:06:13.988790 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [.], ack 1, win 229, options [nop,nop,TS val 3961739880 ecr 3371233124], length 0
13:06:13.988849 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [P.], seq 1:75, ack 1, win 229, options [nop,nop,TS val 3961739880 ecr 3371233124], length 74: HTTP: GET / HTTP/1.1
13:06:13.989943 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [.], ack 75, win 211, options [nop,nop,TS val 3371233127 ecr 3961739880], length 0
13:06:13.990289 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [P.], seq 1:132, ack 75, win 211, options [nop,nop,TS val 3371233127 ecr 3961739880], length 131: HTTP: HTTP/1.1 200 OK
13:06:13.990356 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [F.], seq 132, ack 75, win 211, options [nop,nop,TS val 3371233127 ecr 3961739880], length 0
13:06:13.990389 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [.], ack 132, win 237, options [nop,nop,TS val 3961739881 ecr 3371233127], length 0
13:06:13.990436 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [F.], seq 75, ack 132, win 237, options [nop,nop,TS val 3961739881 ecr 3371233127], length 0
13:06:13.990449 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [.], ack 133, win 237, options [nop,nop,TS val 3961739881 ecr 3371233127], length 0
13:06:13.990503 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [.], ack 76, win 211, options [nop,nop,TS val 3371233127 ecr 3961739881], length 0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069 |
Created attachment 1863893 [details] topology Description of problem: routingViaHost: true externalTrafficPolicy: cluster Trying to send http traffic from a test-container (topology attached) towards an NGINX pod. Static route on a Speaker Pod that was added doesn`t work, when traffic comes back apiVersion: v1 kind: Service metadata: name: nginx-local namespace: default annotations: metallb.universe.tf/address-pool: addresspool3 spec: ports: - port: 80 targetPort: 80 selector: app: nginx-local type: LoadBalancer externalTrafficPolicy: Cluster Version-Release number of selected component (if applicable): 4.10.0-rc.5 metallb-operator.4.10.0-202202160023 How reproducible: 100% Steps to Reproduce: 1.Create metallb layer3 scenario according to the attached topology 2. test-container# wget -qO- 4.4.4.1 3. Actual results: traffic fails Expected results: traffic pass Additional info: 1) works fine with "externalTrafficPolicy: Local" 2) 11:34:59.104722 34:48:ed:f3:6b:7c > 00:00:5e:00:01:01, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60) 4.4.4.1.80 > 10.100.100.254.54538: Flags [S.], cksum 0xb247 (incorrect -> 0xeae5), seq 414318286, ack 2174062796, win 26960, options [mss 1360,sackOK,TS val 3872640667 ecr 2179664749,nop,wscale 7], length 0 1:35 Default Gateway: 10.46.55.254 dev br-ex lladdr 00:00:5e:00:01:01 REACHABLE