Bug 2038309
Summary: | MetalLB with Local externalTrafficPolicy does not work with OVNKubernetes when using Local GW mode | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Jose Castillo Lema <jlema> |
Component: | Networking | Assignee: | Surya Seetharaman <surya> |
Networking sub component: | ovn-kubernetes | QA Contact: | Anurag saxena <anusaxen> |
Status: | CLOSED DUPLICATE | Docs Contact: | |
Severity: | medium | ||
Priority: | medium | CC: | dblack, jlema, mmahmoud, murali, surya |
Version: | 4.10 | ||
Target Milestone: | --- | ||
Target Release: | 4.10.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2022-01-07 21:03:55 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Jose Castillo Lema
2022-01-07 18:47:22 UTC
Troubleshooting steps: We created a deployment, all endpoints were on the same node worker000: [kni@f12-h17-b07-5039ms surya]$ oc get pods -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES web-server-l3-7855447cdc-cn5nm 1/1 Running 0 87m 10.128.3.198 worker000-5039ms <none> <none> web-server-l3-7855447cdc-l25wx 1/1 Running 0 87m 10.128.3.200 worker000-5039ms <none> <none> web-server-l3-7855447cdc-r25g9 1/1 Running 0 87m 10.128.3.197 worker000-5039ms <none> <none> web-server-l3-7855447cdc-rx9jv 1/1 Running 0 87m 10.128.3.199 worker000-5039ms <none> <none> web-server-l3-7855447cdc-snngh 1/1 Running 0 87m 10.128.3.196 worker000-5039ms <none> <none> We created the LB service: oc describe svc web-service-l3 Name: web-service-l3 Namespace: default Labels: app=http-1 group=kb-mb-wl Annotations: metallb.universe.tf/address-pool: addresspool-l3 Selector: app=web-server-l3 Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.226.131 IPs: 172.30.226.131 LoadBalancer Ingress: 10.10.10.10 Port: http 8080/TCP TargetPort: 8080/TCP NodePort: http 32229/TCP Endpoints: 10.128.3.196:8080,10.128.3.197:8080,10.128.3.198:8080 + 2 more... Port: http-2 80/TCP TargetPort: 80/TCP NodePort: http-2 31220/TCP Endpoints: 10.128.3.196:80,10.128.3.197:80,10.128.3.198:80 + 2 more... Session Affinity: None External Traffic Policy: Local We checked flows on br-ex sh-4.4# ovs-ofctl dump-flows br-ex | grep 10.10.10.10 cookie=0x2920e98f218c68aa, duration=1348.311s, table=0, n_packets=0, n_bytes=0, idle_age=1348, priority=110,tcp,in_port=1,nw_dst=10.10.10.1 0,tp_dst=8080 actions=ct(commit,table=6,zone=64003) cookie=0xd093673aa729ad2, duration=1348.311s, table=0, n_packets=19, n_bytes=1406, idle_age=388, priority=110,tcp,in_port=1,nw_dst=10.10.10 .10,tp_dst=80 actions=ct(commit,table=6,zone=64003) cookie=0x2920e98f218c68aa, duration=1348.311s, table=0, n_packets=0, n_bytes=0, idle_age=1348, priority=110,tcp,in_port=LOCAL,nw_src=10.10. 10.10,tp_src=8080 actions=ct(table=7,zone=64003) cookie=0xd093673aa729ad2, duration=1348.311s, table=0, n_packets=0, n_bytes=0, idle_age=1348, priority=110,tcp,in_port=LOCAL,nw_src=10.10.1 0.10,tp_src=80 actions=ct(table=7,zone=64003) Looking at the second flow, it was clear that packet was going into the node, but response wasn't coming out (since n_packets=19). We realized that this was ETP=local flows for LGW mode: I0107 19:12:38.020566 40592 config.go:1714] Gateway config: {Mode:local Interface:br-ex EgressGWInterface: NextHop: VLANID:0 NodeportEnable:true DisableSNATMultipleGWs:false V4JoinSubnet:100.64.0.0/16 V6JoinSubnet:fd98::/64 DisablePacketMTUCheck:false RouterSubnet:} Looking at the logs nothing was suspicious: I0107 19:23:05.177756 40592 port_claim.go:182] Handle NodePort service web-service-l3 port 32347 I0107 19:23:05.177794 40592 port_claim.go:40] Opening socket for service: default/web-service-l3, port: 32347 and protocol TCP I0107 19:23:05.177803 40592 port_claim.go:63] Opening socket for LocalPort "nodePort for default/web-service-l3:http" (:32347/tcp) I0107 19:23:05.177913 40592 port_claim.go:182] Handle NodePort service web-service-l3 port 31609 I0107 19:23:05.177921 40592 port_claim.go:40] Opening socket for service: default/web-service-l3, port: 31609 and protocol TCP I0107 19:23:05.177926 40592 port_claim.go:63] Opening socket for LocalPort "nodePort for default/web-service-l3:http-2" (:31609/tcp) I0107 19:23:05.177966 40592 healthcheck.go:142] Opening healthcheck "default/web-service-l3" on port 32043 I0107 19:23:05.178019 40592 gateway_shared_intf.go:528] Adding service web-service-l3 in namespace default I0107 19:23:05.178031 40592 gateway_shared_intf.go:532] No endpoint found for service web-service-l3 in namespace default during service Add I0107 19:23:05.178037 40592 gateway_shared_intf.go:541] Service Add web-service-l3 event in namespace default came before endpoint event setting svcConfig I0107 19:23:05.178048 40592 gateway_shared_intf.go:207] Adding flows on breth0 for Nodeport Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local I0107 19:23:05.178063 40592 gateway_shared_intf.go:207] Adding flows on breth0 for Nodeport Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local I0107 19:23:05.178090 40592 healthcheck.go:167] Starting goroutine for healthcheck "default/web-service-l3" on port 32043 I0107 19:23:05.191957 40592 healthcheck.go:222] Reporting 5 endpoints for healthcheck "default/web-service-l3" I0107 19:23:05.191965 40592 gateway_shared_intf.go:644] Adding endpoints web-service-l3 in namespace default I0107 19:23:05.234680 40592 gateway_shared_intf.go:567] Deleting old service rules for: &Service{ObjectMeta:{web-service-l3 default 43096508-5382-4bba-8ea8-fb590b32d8f5 11650838 0 2022-01-07 19:23:05 +0000 UTC <nil> <nil> map[app:http-1 group:kb-mb-wl] map[metallb.universe.tf/address-pool:addresspool-l3-b] [] [] [{kubectl-create Update v1 2022-01-07 19:23:05 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:metallb.universe.tf/address-pool":{}},"f:labels":{".":{},"f:app":{},"f:group":{}}},"f:spec":{"f:allocateLoadBalancerNodePorts":{},"f:externalTrafficPolicy":{},"f:internalTrafficPolicy":{},"f:ports":{".":{},"k:{\"port\":80,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}},"k:{\"port\":8080,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}}},"f:selector":{},"f:sessionAffinity":{},"f:type":{}}} }]},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:http,Protocol:TCP,Port:8080,TargetPort:{0 8080 },NodePort:32347,AppProtocol:nil,},ServicePort{Name:http-2,Protocol:TCP,Port:80,TargetPort:{0 80 },NodePort:31609,AppProtocol:nil,},},Selector:map[string]string{app: web-server-l3,},ClusterIP:172.30.10.231,Type:LoadBalancer,ExternalIPs:[],SessionAffinity:None,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:Local,HealthCheckNodePort:32043,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamilyPolicy:*SingleStack,ClusterIPs:[172.30.10.231],IPFamilies:[IPv4],AllocateLoadBalancerNodePorts:*true,LoadBalancerClass:nil,InternalTrafficPolicy:*Cluster,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{},},Conditions:[]Condition{},},} I0107 19:23:05.286234 40592 gateway_shared_intf.go:572] Adding new service rules for: &Service{ObjectMeta:{web-service-l3 default 43096508-5382-4bba-8ea8-fb590b32d8f5 11650839 0 2022-01-07 19:23:05 +0000 UTC <nil> <nil> map[app:http-1 group:kb-mb-wl] map[metallb.universe.tf/address-pool:addresspool-l3-b] [] [] [{controller Update v1 2022-01-07 19:23:05 +0000 UTC FieldsV1 {"f:status":{"f:loadBalancer":{"f:ingress":{}}}} status} {kubectl-create Update v1 2022-01-07 19:23:05 +0000 UTC FieldsV1 {"f:metadata":{"f:annotations":{".":{},"f:metallb.universe.tf/address-pool":{}},"f:labels":{".":{},"f:app":{},"f:group":{}}},"f:spec":{"f:allocateLoadBalancerNodePorts":{},"f:externalTrafficPolicy":{},"f:internalTrafficPolicy":{},"f:ports":{".":{},"k:{\"port\":80,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}},"k:{\"port\":8080,\"protocol\":\"TCP\"}":{".":{},"f:name":{},"f:port":{},"f:protocol":{},"f:targetPort":{}}},"f:selector":{},"f:sessionAffinity":{},"f:type":{}}} }]},Spec:ServiceSpec{Ports:[]ServicePort{ServicePort{Name:http,Protocol:TCP,Port:8080,TargetPort:{0 8080 },NodePort:32347,AppProtocol:nil,},ServicePort{Name:http-2,Protocol:TCP,Port:80,TargetPort:{0 80 },NodePort:31609,AppProtocol:nil,},},Selector:map[string]string{app: web-server-l3,},ClusterIP:172.30.10.231,Type:LoadBalancer,ExternalIPs:[],SessionAffinity:None,LoadBalancerIP:,LoadBalancerSourceRanges:[],ExternalName:,ExternalTrafficPolicy:Local,HealthCheckNodePort:32043,PublishNotReadyAddresses:false,SessionAffinityConfig:nil,IPFamilyPolicy:*SingleStack,ClusterIPs:[172.30.10.231],IPFamilies:[IPv4],AllocateLoadBalancerNodePorts:*true,LoadBalancerClass:nil,InternalTrafficPolicy:*Cluster,},Status:ServiceStatus{LoadBalancer:LoadBalancerStatus{Ingress:[]LoadBalancerIngress{LoadBalancerIngress{IP:6.6.6.6,Hostname:,Ports:[]PortStatus{},},},},Conditions:[]Condition{},},} I0107 19:23:05.286339 40592 gateway_shared_intf.go:207] Adding flows on breth0 for Nodeport Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local I0107 19:23:05.286355 40592 gateway_shared_intf.go:325] Adding flows on breth0 for Ingress Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local I0107 19:23:05.286365 40592 gateway_shared_intf.go:207] Adding flows on breth0 for Nodeport Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local I0107 19:23:05.286376 40592 gateway_shared_intf.go:325] Adding flows on breth0 for Ingress Service web-service-l3 in Namespace: default since ExternalTrafficPolicy=local Next we checked if all the iptable rules were intact: We could see the rules for NodePort: [0:0] -A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 31220 -j DNAT --to-destination 169.254.169.3:31220 [0:0] -A OVN-KUBE-NODEPORT -p tcp -m addrtype --dst-type LOCAL -m tcp --dport 32229 -j DNAT --to-destination 169.254.169.3:32229 and the return rule for preventing SNAT: [0:0] -A OVN-KUBE-SNAT-MGMTPORT -p tcp -m tcp --dport 31220 -j RETURN [0:0] -A OVN-KUBE-SNAT-MGMTPORT -p tcp -m tcp --dport 32229 -j RETURN but the OVN-KUBE-EXTERNALIP rule was missing which was necessary to get the packet into ovn-k8s-mp0. Since the externalIP was a LB.ingress.VIP, we were missing this fix: https://github.com/openshift/ovn-kubernetes/pull/888 which creates the iptable rules for ingress IPs as well. Created a custom image to include this fix: quay.io/itssurya/dev-images:af09cb6c-37b6-4d12-a463-8e2b91f49c19 and tested this on the cluster and indeed we were able to see the iptable rule for OVN-KUBE-EXTERNALIP getting created: [1:60] -A OVN-KUBE-EXTERNALIP -d 10.10.10.10/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 169.254.169.3:31220 [0:0] -A OVN-KUBE-EXTERNALIP -d 10.10.10.10/32 -p tcp -m tcp --dport 8080 -j DNAT --to-destination 169.254.169.3:32229 Doing a curl then worked: [kni@f12-h17-b07-5039ms ~]$ curl 10.10.10.10 <!DOCTYPE html> <html> <head> <title>Hello World</title> This is dupe of https://bugzilla.redhat.com/show_bug.cgi?id=2031012 *** This bug has been marked as a duplicate of bug 2031012 *** |