Bug 2038309 - MetalLB with Local externalTrafficPolicy does not work with OVNKubernetes when using Local GW mode
Summary: MetalLB with Local externalTrafficPolicy does not work with OVNKubernetes whe...
Keywords:
Status: CLOSED DUPLICATE of bug 2031012
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: Surya Seetharaman
QA Contact: Anurag saxena
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-01-07 18:47 UTC by Jose Castillo Lema
Modified: 2022-01-07 21:06 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-01-07 21:03:55 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jose Castillo Lema 2022-01-07 18:47:22 UTC
Description of problem:
We have a small baremetal cluster. Whenever we change the externalTrafficPolicy from Cluster to Local we lose connectivity to the services (this happens both in L2 and L3 mode). We are using OVNKubernetes.

Version-Release number of selected component (if applicable):
 - OCP 4.10 nightly
 - MetalLB operator v4.10.0-202112241546 (downstream)

How reproducible:
100%

Steps to Reproduce:
1. Create a lb type service with externalTrafficPolicy=Local
2. Try to reach the service through its ExternalIP

Actual results:
1. The service is not reachable

Expected results:
1. The service should be reachable (as is when externalTrafficPolicy=Cluster)

Additional info:

This is our setup with local mode and L3:
$ oc get svc
NAME             TYPE           CLUSTER-IP       EXTERNAL-IP                            PORT(S)                       AGE
web-service-l3  LoadBalancer  172.30.118.196  10.10.10.10              8080:30719/TCP,80:32376/TCP  6m38s

The routes are there:
$ ip r
10.10.10.10 proto bgp metric 20 
	nexthop via 192.168.216.13 dev baremetal weight 1 
	nexthop via 192.168.216.14 dev baremetal weight 1

We can reach both worker though the node IPs:
$ curl 192.168.216.13:32376
<!DOCTYPE html>
$ curl 192.168.216.14:32376
<!DOCTYPE html>

However no luck from the external IP (this only happens in local mode, cluster mode works just fine):
$ curl 10.10.10.10 --connect-timeout 3
curl: (28) Connection timed out after 3001 milliseconds

Comment 2 Surya Seetharaman 2022-01-07 21:01:56 UTC
Troubleshooting steps:

We created a deployment, all endpoints were on the same node worker000:

[kni@f12-h17-b07-5039ms surya]$ oc get pods -owide
NAME                             READY   STATUS    RESTARTS   AGE    IP               NODE               NOMINATED NODE   READINESS GATES
web-server-l3-7855447cdc-cn5nm   1/1     Running   0          87m    10.128.3.198     worker000-5039ms   <none>           <none>
web-server-l3-7855447cdc-l25wx   1/1     Running   0          87m    10.128.3.200     worker000-5039ms   <none>           <none>
web-server-l3-7855447cdc-r25g9   1/1     Running   0          87m    10.128.3.197     worker000-5039ms   <none>           <none>
web-server-l3-7855447cdc-rx9jv   1/1     Running   0          87m    10.128.3.199     worker000-5039ms   <none>           <none>
web-server-l3-7855447cdc-snngh   1/1     Running   0          87m    10.128.3.196     worker000-5039ms   <none>           <none>

We created the LB service:

oc describe svc web-service-l3
Name:                     web-service-l3
Namespace:                default
Labels:                   app=http-1
                          group=kb-mb-wl
Annotations:              metallb.universe.tf/address-pool: addresspool-l3
Selector:                 app=web-server-l3
Type:                     LoadBalancer
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.30.226.131
IPs:                      172.30.226.131
LoadBalancer Ingress:     10.10.10.10
Port:                     http  8080/TCP
TargetPort:               8080/TCP
NodePort:                 http  32229/TCP
Endpoints:                10.128.3.196:8080,10.128.3.197:8080,10.128.3.198:8080 + 2 more...
Port:                     http-2  80/TCP
TargetPort:               80/TCP
NodePort:                 http-2  31220/TCP
Endpoints:                10.128.3.196:80,10.128.3.197:80,10.128.3.198:80 + 2 more...
Session Affinity:         None
External Traffic Policy:  Local


We checked flows on br-ex 

sh-4.4# ovs-ofctl dump-flows br-ex | grep 10.10.10.10                                                                                       
 cookie=0x2920e98f218c68aa, duration=1348.311s, table=0, n_packets=0, n_bytes=0, idle_age=1348, priority=110,tcp,in_port=1,nw_dst=10.10.10.1
0,tp_dst=8080 actions=ct(commit,table=6,zone=64003)                                                                                         
 cookie=0xd093673aa729ad2, duration=1348.311s, table=0, n_packets=19, n_bytes=1406, idle_age=388, priority=110,tcp,in_port=1,nw_dst=10.10.10
.10,tp_dst=80 actions=ct(commit,table=6,zone=64003)                                                                                         
 cookie=0x2920e98f218c68aa, duration=1348.311s, table=0, n_packets=0, n_bytes=0, idle_age=1348, priority=110,tcp,in_port=LOCAL,nw_src=10.10.
10.10,tp_src=8080 actions=ct(table=7,zone=64003)                                                                                            
 cookie=0xd093673aa729ad2, duration=1348.311s, table=0, n_packets=0, n_bytes=0, idle_age=1348, priority=110,tcp,in_port=LOCAL,nw_src=10.10.1
0.10,tp_src=80 actions=ct(table=7,zone=64003)       

Looking at the second flow, it was clear that packet was going into the node, but response wasn't coming out (since n_packets=19).

We realized that this was ETP=local flows for LGW mode:

I0107 19:12:38.020566   40592 config.go:1714] Gateway config: {Mode:local Interface:br-ex EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 DisablePacketMTUCheck:false RouterSubnet:}


Looking at the logs nothing was suspicious:

I0107 19:23:05.177756   40592 port_claim.go:182] Handle NodePort service web-service-l3 port 32347
I0107 19:23:05.177794   40592 port_claim.go:40] Opening socket for service: default/web-service-l3, port: 32347 and protocol TCP
I0107 19:23:05.177803   40592 port_claim.go:63] Opening socket for LocalPort "nodePort for default/web-service-l3:http" (:32347/tcp)
I0107 19:23:05.177913   40592 port_claim.go:182] Handle NodePort service web-service-l3 port 31609
I0107 19:23:05.177921   40592 port_claim.go:40] Opening socket for service: default/web-service-l3, port: 31609 and protocol TCP
I0107 19:23:05.177926   40592 port_claim.go:63] Opening socket for LocalPort "nodePort for default/web-service-l3:http-2" (:31609/tcp)
I0107 19:23:05.177966   40592 healthcheck.go:142] Opening healthcheck "default/web-service-l3" on port 32043
I0107 19:23:05.178019   40592 gateway_shared_intf.go:528] Adding service web-service-l3 in namespace default
I0107 19:23:05.178031   40592 gateway_shared_intf.go:532] No endpoint found for service web-service-l3 in namespace default during service Add
I0107 19:23:05.178037   40592 gateway_shared_intf.go:541] Service Add web-service-l3 event in namespace default came before endpoint event setting svcConfig
I0107 19:23:05.178048   40592 gateway_shared_intf.go:207] Adding flows on breth0 for Nodeport Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local
I0107 19:23:05.178063   40592 gateway_shared_intf.go:207] Adding flows on breth0 for Nodeport Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local
I0107 19:23:05.178090   40592 healthcheck.go:167] Starting goroutine for healthcheck "default/web-service-l3" on port 32043
I0107 19:23:05.191957   40592 healthcheck.go:222] Reporting 5 endpoints for healthcheck "default/web-service-l3"
I0107 19:23:05.191965   40592 gateway_shared_intf.go:644] Adding endpoints web-service-l3 in namespace default
I0107 19:23:05.234680   40592 gateway_shared_intf.go:567] Deleting old service rules for: &Service{ObjectMeta:{web-service-l3  default  43096508-5382-4bba-8ea8-fb590b32d8f5 11650838 0 2022-01-07 19:23:05 +0000 UTC <nil> <nil> map[app:http-1 group:kb-mb-wl] map[metallb.universe.tf/address-pool:addresspool-l3-b] [] []  [{kubectl-create Update v1 2022-01-07 19:23:05 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:metallb.universe.tf/address-pool":{}},"f:labels":{".":{},"f:app":{},"f:group":{}}},"f:spec":{"f:allocateLoadBalancerNodePorts":{},"f:externalTrafficPolicy":{},"f:internalTrafficPolicy":{},"f:ports":{".":{},"k:{\"port\":80,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}},"k:{\"port\":8080,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}}},"f:selector":{},"f:sessionAffinity":{},"f:type":{}}} }]},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:http,Protocol:TCP,Port:8080,TargetPort:{0 8080 },NodePort:32347,AppProtocol:nil,},ServicePort{Name:http-2,Protocol:TCP,Port:80,TargetPort:{0 80 },NodePort:31609,AppProtocol:nil,},},Selector:map[string]string{app: web-server-l3,},ClusterIP:172.30.10.231,Type:LoadBalancer,ExternalIPs:[],SessionAffinity:None,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:Local,HealthCheckNodePort:32043,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamilyPolicy:*SingleStack,ClusterIPs:[172.30.10.231],IPFamilies:[IPv4],AllocateLoadBalancerNodePorts:*true,LoadBalancerClass:nil,InternalTrafficPolicy:*Cluster,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},Conditions:[]Condition{},},}
I0107 19:23:05.286234   40592 gateway_shared_intf.go:572] Adding new service rules for: &Service{ObjectMeta:{web-service-l3  default  43096508-5382-4bba-8ea8-fb590b32d8f5 11650839 0 2022-01-07 19:23:05 +0000 UTC <nil> <nil> map[app:http-1 group:kb-mb-wl] map[metallb.universe.tf/address-pool:addresspool-l3-b] [] []  [{controller Update v1 2022-01-07 19:23:05 +0000 UTC FieldsV1 {"f:status":{"f:loadBalancer":{"f:ingress":{}}}} status} {kubectl-create Update v1 2022-01-07 19:23:05 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:metallb.universe.tf/address-pool":{}},"f:labels":{".":{},"f:app":{},"f:group":{}}},"f:spec":{"f:allocateLoadBalancerNodePorts":{},"f:externalTrafficPolicy":{},"f:internalTrafficPolicy":{},"f:ports":{".":{},"k:{\"port\":80,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}},"k:{\"port\":8080,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}}},"f:selector":{},"f:sessionAffinity":{},"f:type":{}}} }]},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:http,Protocol:TCP,Port:8080,TargetPort:{0 8080 },NodePort:32347,AppProtocol:nil,},ServicePort{Name:http-2,Protocol:TCP,Port:80,TargetPort:{0 80 },NodePort:31609,AppProtocol:nil,},},Selector:map[string]string{app: web-server-l3,},ClusterIP:172.30.10.231,Type:LoadBalancer,ExternalIPs:[],SessionAffinity:None,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:Local,HealthCheckNodePort:32043,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamilyPolicy:*SingleStack,ClusterIPs:[172.30.10.231],IPFamilies:[IPv4],AllocateLoadBalancerNodePorts:*true,LoadBalancerClass:nil,InternalTrafficPolicy:*Cluster,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{LoadBalancerIngress{IP:6.6.6.6,Hostname:,Ports:[]PortStatus{},},},},Conditions:[]Condition{},},}
I0107 19:23:05.286339   40592 gateway_shared_intf.go:207] Adding flows on breth0 for Nodeport Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local
I0107 19:23:05.286355   40592 gateway_shared_intf.go:325] Adding flows on breth0 for Ingress Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local
I0107 19:23:05.286365   40592 gateway_shared_intf.go:207] Adding flows on breth0 for Nodeport Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local
I0107 19:23:05.286376   40592 gateway_shared_intf.go:325] Adding flows on breth0 for Ingress Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local


Next we checked if all the iptable rules were intact:

We could see the rules for NodePort:
[0:0] -A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 31220 -j DNAT --to-destination 169.254.169.3:31220
[0:0] -A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 32229 -j DNAT --to-destination 169.254.169.3:32229

and the return rule for preventing SNAT:
[0:0] -A OVN-KUBE-SNAT-MGMTPORT -p tcp -m tcp --dport 31220 -j RETURN
[0:0] -A OVN-KUBE-SNAT-MGMTPORT -p tcp -m tcp --dport 32229 -j RETURN

but the OVN-KUBE-EXTERNALIP rule was missing which was necessary to get the packet into ovn-k8s-mp0. Since the externalIP was a LB.ingress.VIP, we were missing this fix: https://github.com/openshift/ovn-kubernetes/pull/888 which creates the iptable rules for ingress IPs as well.

Created a custom image to include this fix: quay.io/itssurya/dev-images:af09cb6c-37b6-4d12-a463-8e2b91f49c19 and tested this on the cluster and indeed we were able to see the iptable rule for OVN-KUBE-EXTERNALIP getting created:

[1:60] -A OVN-KUBE-EXTERNALIP -d 10.10.10.10/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 169.254.169.3:31220
[0:0] -A OVN-KUBE-EXTERNALIP -d 10.10.10.10/32 -p tcp -m tcp --dport 8080 -j DNAT --to-destination 169.254.169.3:32229


Doing a curl then worked:
[kni@f12-h17-b07-5039ms ~]$ curl 10.10.10.10
<!DOCTYPE html>
<html>
<head>
<title>Hello World</title>

This is dupe of https://bugzilla.redhat.com/show_bug.cgi?id=2031012

Comment 3 Mohamed Mahmoud 2022-01-07 21:03:55 UTC

*** This bug has been marked as a duplicate of bug 2031012 ***


Note You need to log in before you can comment on or make changes to this bug.