Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2060159

Summary: LGW: External->Service of type ETP=Cluster doesn't go to the node
Product: OpenShift Container Platform Reporter: elevin
Component: NetworkingAssignee: Surya Seetharaman <surya>
Networking sub component: ovn-kubernetes QA Contact: elevin
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: cgoncalves, djuran, fbaudin, fherrman, fpaoline, mmahmoud, surya, wking
Version: 4.10Keywords: Regression, Upgrades
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: Host routes are ignored when load balancer type Services are configured with "Cluster" traffic policy. Consequence: Egress traffic of load balancer type Services is always steered to the default gateway, regardless of whether there is a better matching route on the host routing table. Workaround (if any): Set load balancer type Services to "Local" traffic policy Result:
Story Points: ---
Clone Of:
: 2073137 (view as bug list) Environment:
Last Closed: 2022-08-10 10:51:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2073137    
Attachments:
Description Flags
topology none

Description elevin 2022-03-02 21:33:00 UTC
Created attachment 1863893 [details]
topology

Description of problem:
routingViaHost: true
externalTrafficPolicy: cluster

Trying to send http traffic from a test-container (topology attached) towards an NGINX pod. Static route on a Speaker Pod that was added doesn`t work, when traffic comes back

apiVersion: v1
kind: Service
metadata:
 name: nginx-local
 namespace: default
 annotations:
   metallb.universe.tf/address-pool: addresspool3
spec:
 ports:
 - port: 80
   targetPort: 80
 selector:
   app: nginx-local
 type: LoadBalancer
 externalTrafficPolicy: Cluster



Version-Release number of selected component (if applicable):
4.10.0-rc.5
metallb-operator.4.10.0-202202160023

How reproducible:
100%

Steps to Reproduce:
1.Create metallb layer3 scenario according to the attached topology
2. test-container# wget -qO- 4.4.4.1
3.

Actual results:
traffic fails

Expected results:
traffic pass

Additional info:
1) works fine with "externalTrafficPolicy: Local"

2) 11:34:59.104722 34:48:ed:f3:6b:7c > 00:00:5e:00:01:01, ethertype IPv4 (0x0800), length 74: (tos 0x0, ttl 62, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    4.4.4.1.80 > 10.100.100.254.54538: Flags [S.], cksum 0xb247 (incorrect -> 0xeae5), seq 414318286, ack 2174062796, win 26960, options [mss 1360,sackOK,TS val 3872640667 ecr 2179664749,nop,wscale 7], length 0
1:35

Default Gateway:
10.46.55.254 dev br-ex lladdr 00:00:5e:00:01:01 REACHABLE

Comment 1 Federico Paolinelli 2022-03-03 08:26:12 UTC
The topology is as follows:

The node is connected to multiple routers, one of them is also the default gateway.
The traffic is coming from a router that is not the default gateway. Routes were added to the node, to steer the traffic back to the client.

What happens is, the traffic comes into the node, but the reply is sent to the default gateway.

With a service with Traffic Policy = Cluster, the routes are ignored because all the traffic happens inside br-ex / ovn.
With a service with Traffic Policy = Local, the traffic is dropped on the host and it works.

My understanding is that such configuration was working until 4.10

Comment 3 Surya Seetharaman 2022-03-03 17:07:17 UTC
Idea is to see if we can steer the traffic into the host for services in lgw, let it hit the routes use the .2 IP to take it back to br-ex and reply should come back the same way (hopefully). Testing this.

Comment 7 Surya Seetharaman 2022-04-05 11:01:33 UTC
Upstream PR merged, downstream PR on its way to merge. Will be backported to 4.10.z

Comment 8 Surya Seetharaman 2022-04-06 21:10:45 UTC
PR merged.

Comment 14 elevin 2022-04-28 13:06:39 UTC
Verified:
Client Version: 4.11.0-0.nightly-2022-04-26-030643
Kustomize Version: v4.5.4
Server Version: 4.11.0-0.nightly-2022-04-26-030643
Kubernetes Version: v1.23.3+d464c70
metallb-operator.4.11.0-202203281806
====================================================
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2022-04-27T15:27:09Z"
  name: hello-world
  namespace: arti-test
  resourceVersion: "1055625"
  uid: 11f8a8f6-42aa-45a9-9208-cb3081af95f7
spec:
  allocateLoadBalancerNodePorts: true
  clusterIP: 172.30.75.111
  clusterIPs:
  - 172.30.75.111
  externalTrafficPolicy: Cluster
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - nodePort: 30640
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: hello-world
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer:
    ingress:
    - ip: 10.10.10.10


bash-5.1# wget -qO- 10.10.10.10
Hello Kubernetes!

13:06:13.985017 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [S], seq 300684042, win 29200, options [mss 1460,sackOK,TS val 3961739875 ecr 0,nop,wscale 7], length 0
13:06:13.988380 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [S.], seq 96525721, ack 300684043, win 26960, options [mss 1360,sackOK,TS val 3371233124 ecr 3961739875,nop,wscale 7], length 0
13:06:13.988790 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [.], ack 1, win 229, options [nop,nop,TS val 3961739880 ecr 3371233124], length 0
13:06:13.988849 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [P.], seq 1:75, ack 1, win 229, options [nop,nop,TS val 3961739880 ecr 3371233124], length 74: HTTP: GET / HTTP/1.1
13:06:13.989943 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [.], ack 75, win 211, options [nop,nop,TS val 3371233127 ecr 3961739880], length 0
13:06:13.990289 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [P.], seq 1:132, ack 75, win 211, options [nop,nop,TS val 3371233127 ecr 3961739880], length 131: HTTP: HTTP/1.1 200 OK
13:06:13.990356 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [F.], seq 132, ack 75, win 211, options [nop,nop,TS val 3371233127 ecr 3961739880], length 0
13:06:13.990389 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [.], ack 132, win 237, options [nop,nop,TS val 3961739881 ecr 3371233127], length 0
13:06:13.990436 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [F.], seq 75, ack 132, win 237, options [nop,nop,TS val 3961739881 ecr 3371233127], length 0
13:06:13.990449 IP 172.16.0.1.37264 > 10.10.10.10.http: Flags [.], ack 133, win 237, options [nop,nop,TS val 3961739881 ecr 3371233127], length 0
13:06:13.990503 IP 10.10.10.10.http > 172.16.0.1.37264: Flags [.], ack 76, win 211, options [nop,nop,TS val 3371233127 ecr 3961739881], length 0

Comment 16 errata-xmlrpc 2022-08-10 10:51:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069