Description of problem: Our tenant is creating a Service of type LoadBalancer. We use OVN in local gateway mode. So we expect IPTABLE rules to be present in the kernel iptables. We do see the IPTABLE rules present but there is stale IPTABLE rule that is preventing the SVC from working. Version-Release number of selected component (if applicable): 4.8.43 How reproducible: very Steps to Reproduce: pls see comment below Actual results: upon redeployment, the iptables retain the stale IP for the LB-SVC which causes connections to fail; a reboot of the node will correct the iptables; this was NOT happening in 4.8.34 and only started after upgrade of cluster to 4.8.43 Expected results: that upon redeployment, the iptables will refresh to include the new IP for the SVC-IP Additional info:
hello, ive included the steps to reproduce the issue in the comment above. as stated earlier, this issue didnt appear until after the upgrade to 4.8.43 (from 4.8.34). my colleague @jocolema has found the likely PR causing this, pls see his SFDC comment below: ``` I wanted to let you know where I am at here. I have been searching through some release notes to see if I could find a change that is similar to what we see between 4.8.34 and 4.8.43. I have found the following change in 4.8.36 which _might_ be relevant: ~~~ Going back through the release notes, I see the following in 4.8.36: https://docs.openshift.com/container-platform/4.8/release_notes/ocp-4-8-release-notes.html#ocp-4-8-36 The errata this posted has a bug fix here: https://bugzilla.redhat.com/show_bug.cgi?id=2057557 I see that this points us to the PR: https://github.com/openshift/ovn-kubernetes/pull/967 https://github.com/openshift/ovn-kubernetes/pull/967/commits/96b67b33b9949dddfd5e122d5764c574c1907e30 ~~~ This change is essentially focused around components like metalLB, and problem statement: 'Services of type loadbalancer do not work if the traffic reaches the node from an interface different from br-ex' So I am still trying to figure out if this is the change you are seeing and metalLB is just primary purpose whereas fix is more global to iptables. If so, I am wondering if the focus is on shared gateway and hence we might be hitting a bug while using local gateway. ``` ill include further information on how OVN is used in the cu cluster from ticket in another comment. the cu is a telco and are utilising the ericsson blueprint dualmode for 5gc deployments [1] (or E///). they are prohibited from upgrading to 4.10 until E/// is upgraded as well, its likely that the metalLB use here is pursuant on that. there are gathers/sosreports/iptables_files attached to the SFDC case but were taken before the steps outlined above: ive requested fresh data to hopefully show what was happening during the reproducer. once those are attached, this BZ will be updated to note it. [1] https://gitlab.consulting.redhat.com/djuran/ericsson-blueprint/-/tree/release_5GC_TS-1.3#user-content-sr-iov
ive submitted significant comments from the cu case ticket above. pls let me know whether there is anything further needed to work this case. currently there is a work-around, but the cu is asking for a fix on this issue as its likely that the PRs hi-lited in the previous comment broke something. ive marked the severity as high, but the cu is one of high-priority.
cu has attached a fresh cluster/network gather to the ticket [1], sosreports from the nodes where the pods in the test were scheduled [2]; it would appear that the iptables information is included already in the network gather in [1]. pls let me know whether anything further is required to begin this investigation [1] https://attachments.access.redhat.com/hydra/rest/cases/03277795/attachments/62598b2f-3a70-4c42-bc28-597c202bcb9f [2] https://attachments.access.redhat.com/hydra/rest/cases/03277795/attachments/a0bc5a43-86e5-419c-a590-3475cb8ae87d https://attachments.access.redhat.com/hydra/rest/cases/03277795/attachments/fd16201a-0ac9-4145-9f11-f34262b50121 https://attachments.access.redhat.com/hydra/rest/cases/03277795/attachments/9c82f488-ab87-4547-b221-b0f0baee26cd
// Debugging Notes After investigating a bit more, realised what is happening. Fixing https://github.com/openshift/ovn-kubernetes/pull/967 and https://github.com/openshift/ovn-kubernetes/pull/905 indirectly contributed to this bug. It's a slightly nasty bug that will only effect users: 1) on LGW mode, 2) using LB svcs where svc.Status.loadBalancer is set 3) on 4.8 clusters if >= OCP 4.8.36 4) on 4.9 clusters if >= OCP 4.9.23 ** Temporary Workaround of course is to restart ovnkube-node until we run into this issue again. OCP 4.8.43: ovnkube-node startup: switch config.Gateway.Mode { case config.GatewayModeLocal: klog.Info("Preparing Local Gateway") gw, err = newLocalGateway(n.name, subnets, gatewayNextHops, gatewayIntf, nodeAnnotator, n.recorder, managementPortConfig) followed by: initGw := func() error { return gw.Init(n.watchFactory) } followed by: func (g *gateway) Init(wf factory.NodeWatchFactory) error { err := g.initFunc() if err != nil { return err } wf.AddServiceHandler(cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) { svc := obj.(*kapi.Service) g.AddService(svc) }, UpdateFunc: func(old, new interface{}) { oldSvc := old.(*kapi.Service) newSvc := new.(*kapi.Service) g.UpdateService(oldSvc, newSvc) }, DeleteFunc: func(obj interface{}) { svc := obj.(*kapi.Service) g.DeleteService(svc) }, }, g.SyncServices) followed by: PATH-A: func (g *gateway) SyncServices(objs []interface{}) { <snip> if g.localPortWatcher != nil { g.localPortWatcher.SyncServices(objs) } } AND PATH-B: func (g *gateway) AddService(svc *kapi.Service) { <snip> if g.localPortWatcher != nil { g.localPortWatcher.AddService(svc) } } followed by: PATH-A leads to: func (l *localPortWatcher) SyncServices(serviceInterface []interface{}) { keepIPTRules := []iptRule{} for _, service := range serviceInterface { svc, ok := service.(*kapi.Service) if !ok { klog.Errorf("Spurious object in syncServices: %v", serviceInterface) continue } keepIPTRules = append(keepIPTRules, getGatewayIPTRules(svc, []string{l.gatewayIPv4, l.gatewayIPv6})...) ---> PROBLEM IS HERE!! } for _, chain := range []string{iptableNodePortChain, iptableExternalIPChain} { recreateIPTRules("nat", chain, keepIPTRules) } // Previously LGW used routes in the localnetGatewayExternalIDTable, to handle // upgrades correctly make sure we flush this table of all routes klog.Infof("Flushing host's routing table: %s", localnetGatewayExternalIDTable) if _, stderr, err := util.RunIP("route", "flush", "table", localnetGatewayExternalIDTable); err != nil { klog.Errorf("Error flushing host's routing table: %s stderr: %s err: %v", localnetGatewayExternalIDTable, stderr, err) } } followed by: func getGatewayIPTRules(service *kapi.Service, gatewayIPs []string) []iptRule { <snipped> externalIPs := make([]string, 0, len(service.Spec.ExternalIPs)+len(service.Status.LoadBalancer.Ingress)) externalIPs = append(externalIPs, service.Spec.ExternalIPs...) for _, ingress := range service.Status.LoadBalancer.Ingress { -----> NOW WE ADD THESE LB RULES TO EXTERNALIP CHAIN FOR SGW if len(ingress.IP) > 0 { externalIPs = append(externalIPs, ingress.IP) } } for _, externalIP := range externalIPs { err := util.ValidatePort(svcPort.Protocol, svcPort.Port) if err != nil { klog.Errorf("Skipping service: %s, invalid service port %v", svcPort.Name, err) continue } if clusterIP, err := util.MatchIPStringFamily(utilnet.IsIPv6String(externalIP), clusterIPs); err == nil { rules = append(rules, getExternalIPTRules(svcPort, externalIP, clusterIP)...) } } } return rules } PATH-B leads to: func (l *localPortWatcher) addService(svc *kapi.Service) error { // don't process headless service or services that do not have NodePorts or ExternalIPs if !util.ServiceTypeHasClusterIP(svc) || !util.IsClusterIPSet(svc) { return nil } for _, ip := range util.GetClusterIPs(svc) { iptRules := []iptRule{} isIPv6Service := utilnet.IsIPv6String(ip) gatewayIP := l.gatewayIPv4 if isIPv6Service { gatewayIP = l.gatewayIPv6 } for _, port := range svc.Spec.Ports { // Fix Azure/GCP LoadBalancers. They will forward traffic directly to the node with the // dest address as the load-balancer ingress IP and port iptRules = append(iptRules, getLoadBalancerIPTRules(svc, port, ip, port.Port)...) ----> NOW WE ADD THESE LB RULES TO NODEPORT CHAIN FOR LGW So really the problem is with calling getGatewayIPTRules from LGW and assuming all the rules are calculated the same way for both the modes. When we fixed the SGW bug, we kinda deviated in the rule addition this causing this bug. Fix should be a one-liner of checking gateway-mode before adding LB.Status.Ingress rules to EXTERNALIPS chain. NOTE: This fix will only go into 4.9 and 4.8, as doing this one-liner will break >= 4.10 where we do things the same way for both the modes.
Able to reproduce this on GCP as well VSphere+MetalLB: GCP: Switch to LGW mode: Create LB svc: sh-4.4# iptables-save | grep 104. -A OVN-KUBE-NODEPORT -d 104.155.139.228/32 -p tcp -m tcp --dport 8080 -j DNAT --to-destination 172.30.152.59:8080 restart ovnkube-node: sh-4.4# iptables-save | grep 104. -A OVN-KUBE-NODEPORT -d 104.155.139.228/32 -p tcp -m tcp --dport 8080 -j DNAT --to-destination 172.30.152.59:8080 -A OVN-KUBE-EXTERNALIP -d 104.155.139.228/32 -p tcp -m tcp --dport 8080 -j DNAT --to-destination 172.30.152.59:8080 We end up with 2 rules. Tested the quick fix: sh-4.4# iptables-save | grep 104. -A gcp-vips -d 104.155.139.228/32 -j REDIRECT -A OVN-KUBE-NODEPORT -d 104.155.139.228/32 -p tcp -m tcp --dport 8080 -j DNAT --to-destination 172.30.152.59:8080 we ended up with only 1 again. With metalLB+vsphere: {Bigger problem if we reuse the same SVC VIP for LB; like we do below} [surya@hidden-temple ovn-kubernetes]$ oc get svc -n a1 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-world LoadBalancer 172.30.22.125 172.31.249.5 80:32509/TCP 174m sh-4.4# iptables-save -c | grep 172.31.249.5 [0:0] -A OVN-KUBE-NODEPORT -d 172.31.249.5/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.30.22.125:80 [0:0] -A OVN-KUBE-EXTERNALIP -d 172.31.249.5/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.30.22.125:80 recreating the svc with different clusterIP but same LB VIP will leave us with stale entries in EXTERNALIPs chain: [surya@hidden-temple ovn-kubernetes]$ oc get svc -n a1 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-world LoadBalancer 172.30.95.195 <pending> 80:31835/TCP 18s sh-4.4# iptables-save -c | grep 172.31.249.5 [0:0] -A OVN-KUBE-NODEPORT -d 172.31.249.5/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.30.95.195:80 [0:0] -A OVN-KUBE-EXTERNALIP -d 172.31.249.5/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.30.22.125:80 Testing my fix on metalLB+vsphere: [surya@hidden-temple ovn-kubernetes]$ oc get svc -n a1 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-world LoadBalancer 172.30.95.195 172.31.249.5 80:31835/TCP 51s sh-4.4# iptables-save -c | grep 172.31.249.5 [0:0] -A OVN-KUBE-NODEPORT -d 172.31.249.5/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.30.95.195:80 even after recreating the svc and/or restarting the pod it stays the same and gets updated correctly, no false rules in EXTERNALIPs chain: [surya@hidden-temple ovn-kubernetes]$ oc get svc -n a1 NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE hello-world LoadBalancer 172.30.226.53 172.31.249.5 80:30606/TCP 13s sh-4.4# iptables-save -c | grep 172.31.249.5 [0:0] -A OVN-KUBE-NODEPORT -d 172.31.249.5/32 -p tcp -m tcp --dport 80 -j DNAT --to-destination 172.30.226.53:80 Thanks Arti for helping with setup/testing!
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.8.49 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:6308