Created attachment 1838313 [details] must gather Description of problem: A route is not consistently served in a LoadBalancerService type IngressController with a route selector. The test creates a LoadBalancerService type IngressController with a route selector and then a route is created matching the route selector label. Then the connectivity to the route is checked but the connection is not consistent as it's returning "connection refused" sometimes. Version-Release number of selected component (if applicable): OCP 4.9.0-0.nightly-2021-10-27-202207 OSP 16.1.6 with Octavia and OVN provider How reproducible: always Steps to Reproduce: ## 1. Install OCP IPI with OVNKubernetes network type on top of OSP ## 2. Set the OVN Octavia provider in the cloud provider config $ oc edit cm cloud-provider-config -n openshift-config [...] [LoadBalancer] use-octavia = True lb-provider = ovn <--- lb-method = SOURCE_IP_PORT <--- [...] ## 3. Wait until the config is applied (no nodes in unschedulable=true), it can take up to 20 minutes $ oc get nodes --field-selector spec.unschedulable=true #until this returns no node ## 4. Create a new LoadBalancerService type IngressController with a label match (type=internal) route selector $ cat ingress_controller_label.yaml apiVersion: v1 items: - apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: sharding-test-internal namespace: openshift-ingress-operator spec: domain: sharding-test-internal.internalapps.apps.ostest.shiftstack.com endpointPublishingStrategy: type: LoadBalancerService nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/worker: "" routeSelector: matchLabels: type: internal status: {} kind: List metadata: resourceVersion: "" selfLink: "" $ oc apply -f ingress_controller_label.yaml ## 5. Wait until the LB for the svc router-sharding-test-internal is created (save the route FIP for later use) $ oc -n openshift-ingress get services/router-sharding-test-internal -o yaml [...] status: loadBalancer: ingress: - ip: 10.0.0.183 <--- [...] ## 6. Create a project, deployment, svc and a route with the label type=internal $ oc new-project sharding-test-internal-ns $ oc create deployment sharding-test-internal-dep --image=quay.io/kuryr/demo $ oc scale deployments/sharding-test-internal-dep --replicas=2 $ oc expose deployment sharding-test-internal-dep --name sharding-test-internal-svc --port 80 --target-port=8080 $ oc expose service sharding-test-internal-svc --name test-sharding-internal-route -l type=internal --hostname=test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com Check the router that will expose the host: $ oc -n sharding-test-internal-ns describe route [...] Requested Host: test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com exposed on router default (host router-default.apps.ostest.shiftstack.com) 11 seconds ago exposed on router sharding-test-internal (host router-sharding-test-internal.sharding-test-internal.internalapps.apps.ostest.shiftstack.com) 10 seconds ago [...] ## 7. Add to the /etc/hosts file <ROUTE_FIP> test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com ## 8. Check connectivity to the route - the connection is not consistent as its returning "connection refused" sometimes [stack@undercloud-0 ~]$ curl http://test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com sharding-test-internal-dep-6c64db5ddd-msw7c: HELLO! I AM ALIVE!!! [stack@undercloud-0 ~]$ curl http://test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com curl: (7) Failed to connect to test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com port 80: Connection refused [stack@undercloud-0 ~]$ curl http://test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com curl: (7) Failed to connect to test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com port 80: Connection refused [stack@undercloud-0 ~]$ curl http://test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com sharding-test-internal-dep-6c64db5ddd-9nbs8: HELLO! I AM ALIVE!!! Actual results: Some requests to the route are refused. Expected results: Consistent HTTP responses. Additional info: ClusterID: 4ba1855f-7059-4791-85e7-fb0bacc22277 ClusterVersion: Stable at "4.9.0-0.nightly-2021-10-27-202207" ClusterOperators: All healthy and stable $ oc -n sharding-test-internal-ns describe route Name: test-sharding-internal-route Namespace: sharding-test-internal-ns Created: 19 minutes ago Labels: type=internal Annotations: <none> Requested Host: test.sharding-test-internal.internalapps.apps.ostest.shiftstack.com exposed on router default (host router-default.apps.ostest.shiftstack.com) 19 minutes ago exposed on router sharding-test-internal (host router-sharding-test-internal.sharding-test-internal.internalapps.apps.ostest.shiftstack.com) 19 minutes ago Path: <none> TLS Termination: <none> Insecure Policy: <none> Endpoint Port: 8080 Service: sharding-test-internal-svc Weight: 100 (100%) Endpoints: 10.128.2.12:8080, 10.131.0.7:8080 $ oc -n sharding-test-internal-ns get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES sharding-test-internal-dep-6c64db5ddd-9nbs8 1/1 Running 0 21m 10.128.2.12 ostest-mhb5k-worker-0-98lpw <none> <none> sharding-test-internal-dep-6c64db5ddd-msw7c 1/1 Running 0 21m 10.131.0.7 ostest-mhb5k-worker-0-lpsgc <none> <none> $ oc -n openshift-ingress get pods NAME READY STATUS RESTARTS AGE router-default-6f46648cd9-4np6p 1/1 Running 0 38m router-default-6f46648cd9-njqpr 1/1 Running 0 41m router-sharding-test-internal-7f66d49cbd-2b524 1/1 Running 0 20m router-sharding-test-internal-7f66d49cbd-cgvj8 1/1 Running 0 20m $ oc -n openshift-ingress logs router-sharding-test-internal-7f66d49cbd-2b524 I1029 12:28:53.578197 1 template.go:437] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 2d1e1f4bd413dd283c92638e23fae940ef4c1e54\nversionFromGit: 4.0.0-345-g2d1e1f4b\ngitTreeState: clean\nbuildDate: 2021-10-26T23:47:20Z\n" I1029 12:28:53.584023 1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" I1029 12:28:53.598462 1 router.go:191] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy" I1029 12:28:53.598691 1 router.go:273] template "msg"="router will coalesce reloads within an interval of each other" "interval"="5s" I1029 12:28:53.599140 1 router.go:337] template "msg"="watching for changes" "path"="/etc/pki/tls/private" I1029 12:28:53.599250 1 router.go:262] router "msg"="router is including routes in all namespaces" E1029 12:28:53.710688 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory I1029 12:28:53.761744 1 router.go:612] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I1029 12:28:58.747902 1 router.go:612] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I1029 12:31:26.177732 1 router.go:612] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" $ oc -n openshift-ingress logs router-sharding-test-internal-7f66d49cbd-cgvj8 I1029 12:28:54.607933 1 template.go:437] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: 2d1e1f4bd413dd283c92638e23fae940ef4c1e54\nversionFromGit: 4.0.0-345-g2d1e1f4b\ngitTreeState: clean\nbuildDate: 2021-10-26T23:47:20Z\n" I1029 12:28:54.614451 1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" I1029 12:28:54.627379 1 router.go:191] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy" I1029 12:28:54.627506 1 router.go:273] template "msg"="router will coalesce reloads within an interval of each other" "interval"="5s" I1029 12:28:54.628020 1 router.go:337] template "msg"="watching for changes" "path"="/etc/pki/tls/private" I1029 12:28:54.628110 1 router.go:262] router "msg"="router is including routes in all namespaces" E1029 12:28:54.741040 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory I1029 12:28:54.790612 1 router.go:612] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I1029 12:31:26.513812 1 router.go:612] template "msg"="router reloaded" "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
How reproducable is this? There could be a lot of reasons why this is happening. Is OVN stable in OpenStack? This is either an issue with OVN reliability in the cloud platform, or an openshift networking problem. All we do is provision resoures, and as far as I can tell, that has been done correctly, since you are able to reach the end service at least some of the time.
I forgot to say that the same test works with OpenshiftSDN and Kuryr, but not with OVNKubernetes so that can tell us that it's not underlying Openstack OVN. It's reproducible almost 100% of the times.
Moving over to ovn-kubernetes as it's only happening with OVNKubernetes according to comment 2.
svc is using ExternalTrafficPolicy set to local, the svc in use here has 2 Endpoints on node ostest-cdmd8-worker-0-4mwck and ostest-cdmd8-worker-0-p85j4 when curl works the traffic hit one of those workers EP pod it was noticed when curl fail the traffic goes to one of the master nodes as shown in the tcpdump traces oc debug node/ostest-cdmd8-master-0 Starting pod/ostest-cdmd8-master-0-debug ... To use host binaries, run `chroot /host` Pod IP: 10.196.1.62 If you don't see a command prompt, try pressing enter. sh-4.4# tcpdump -i any host 10.46.22.194 -nn -vvv dropped privs to tcpdump tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture size 262144 bytes 12:50:16.190193 IP (tos 0x0, ttl 63, id 56587, offset 0, flags [DF], proto TCP (6), length 60) 10.46.22.194.43792 > 10.196.1.62.32321: Flags [S], cksum 0x77e6 (correct), seq 536703276, win 29200, options [mss 1460,sackOK,TS val 3505563049 ecr 0,nop,wscale 7], length 0 12:50:16.191991 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 40) 10.196.1.62.32321 > 10.46.22.194.43792: Flags [R.], cksum 0xc862 (correct), seq 0, ack 536703277, win 0, length 0 so either Octiva LB doesn't support ETP local and we need to change it to cluster or Octiva LB has a bug when svc is ETP local it doesn't send the packets to the right nodes causing curl failure
(In reply to Mohamed Mahmoud from comment #13) > svc is using ExternalTrafficPolicy set to local, the svc in use here has 2 > Endpoints on node ostest-cdmd8-worker-0-4mwck and > ostest-cdmd8-worker-0-p85j4 > when curl works the traffic hit one of those workers EP pod > it was noticed when curl fail the traffic goes to one of the master nodes as > shown in the tcpdump traces > oc debug node/ostest-cdmd8-master-0 > Starting pod/ostest-cdmd8-master-0-debug ... > To use host binaries, run `chroot /host` > Pod IP: 10.196.1.62 > If you don't see a command prompt, try pressing enter. > sh-4.4# tcpdump -i any host 10.46.22.194 -nn -vvv > dropped privs to tcpdump > tcpdump: listening on any, link-type LINUX_SLL (Linux cooked v1), capture > size 262144 bytes > 12:50:16.190193 IP (tos 0x0, ttl 63, id 56587, offset 0, flags [DF], proto > TCP (6), length 60) > 10.46.22.194.43792 > 10.196.1.62.32321: Flags [S], cksum 0x77e6 > (correct), seq 536703276, win 29200, options [mss 1460,sackOK,TS val > 3505563049 ecr 0,nop,wscale 7], length 0 > 12:50:16.191991 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP > (6), length 40) > 10.196.1.62.32321 > 10.46.22.194.43792: Flags [R.], cksum 0xc862 > (correct), seq 0, ack 536703277, win 0, length 0 > > so either Octiva LB doesn't support ETP local and we need to change it to > cluster or Octiva LB has a bug when svc is ETP local it doesn't send the > packets to the right nodes causing curl failure or the worker node that doesn't have EP on all those causes curl will fail
Setting the ExternalTrafficPolicy to Cluster makes the curl work 100% of the times. Steps: ## 1. Create a new LoadBalancerService type IngressController: $ cat ingress_controller_cluster.yaml apiVersion: v1 items: - apiVersion: operator.openshift.io/v1 kind: IngressController metadata: name: sharding-test-cluster namespace: openshift-ingress-operator spec: domain: sharding-test-cluster.internalapps.apps.ostest.shiftstack.com endpointPublishingStrategy: type: LoadBalancerService nodePlacement: nodeSelector: matchLabels: node-role.kubernetes.io/worker: "" routeSelector: matchLabels: type: cluster status: {} kind: List metadata: resourceVersion: "" selfLink: "" $ oc apply -f ingress_controller_cluster.yaml ## 2. Check router pods are created $ oc -n openshift-ingress get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES router-default-84d67fb69f-b7hmn 1/1 Running 0 26h 10.196.0.250 ostest-cdmd8-worker-0-4mwck <none> <none> router-default-84d67fb69f-fftwp 1/1 Running 0 26h 10.196.1.104 ostest-cdmd8-worker-0-jcwmq <none> <none> router-sharding-test-cluster-9c9ff8898-fv6vq 1/1 Running 0 39m 10.128.2.28 ostest-cdmd8-worker-0-4mwck <none> <none> router-sharding-test-cluster-9c9ff8898-x2bcg 1/1 Running 0 39m 10.131.0.123 ostest-cdmd8-worker-0-p85j4 <none> <none> ## 3. Check router LB type svc is created [stack@undercloud-0 ~]$ oc -n openshift-ingress get svc -o wide NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR router-internal-default ClusterIP 172.30.217.73 <none> 80/TCP,443/TCP,1936/TCP 2d4h ingresscontroller.operator.openshift.io/deployment-ingresscontroller=default router-internal-sharding-test-cluster ClusterIP 172.30.159.7 <none> 80/TCP,443/TCP,1936/TCP 40m ingresscontroller.operator.openshift.io/deployment-ingresscontroller=sharding-test-cluster router-sharding-test-cluster LoadBalancer 172.30.93.207 10.46.22.230 80:32661/TCP,443:32584/TCP 40m ingresscontroller.operator.openshift.io/deployment-ingresscontroller=sharding-test-cluster ## 4. Check the LB IP $ oc -n openshift-ingress get services/router-sharding-test-cluster -o yaml [...] status: loadBalancer: ingress: - ip: 10.46.22.230 <--- [...] ## 5. Check the router-sharding-test-cluster ExternalTrafficPolicy type $ oc -n openshift-ingress describe svc router-sharding-test-cluster Name: router-sharding-test-cluster Namespace: openshift-ingress Labels: app=router ingresscontroller.operator.openshift.io/owning-ingresscontroller=sharding-test-cluster router=router-sharding-test-cluster Annotations: traffic-policy.network.alpha.openshift.io/local-with-fallback: Selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=sharding-test-cluster Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.93.207 IPs: 172.30.93.207 LoadBalancer Ingress: 10.46.22.230 Port: http 80/TCP TargetPort: http/TCP NodePort: http 32661/TCP Endpoints: 10.128.2.28:80,10.131.0.123:80 Port: https 443/TCP TargetPort: https/TCP NodePort: https 32584/TCP Endpoints: 10.128.2.28:443,10.131.0.123:443 Session Affinity: None External Traffic Policy: Local <<<<<<<<< HealthCheck NodePort: 32413 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal EnsuringLoadBalancer 92s service-controller Ensuring load balancer Normal EnsuredLoadBalancer 51s service-controller Ensured load balancer The svc was created with Local policy ## 6. The issue is reproduced at this point, curl works intermittently (note it's still with "External Traffic Policy: Local") ## 7. Change External Traffic Policy to Cluster $ oc -n openshift-ingress edit svc router-sharding-test-cluster service/router-sharding-test-cluster edited $ oc -n openshift-ingress describe svc router-sharding-test-cluster Name: router-sharding-test-cluster Namespace: openshift-ingress Labels: app=router ingresscontroller.operator.openshift.io/owning-ingresscontroller=sharding-test-cluster router=router-sharding-test-cluster Annotations: traffic-policy.network.alpha.openshift.io/local-with-fallback: Selector: ingresscontroller.operator.openshift.io/deployment-ingresscontroller=sharding-test-cluster Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.93.207 IPs: 172.30.93.207 LoadBalancer Ingress: 10.46.22.230 Port: http 80/TCP TargetPort: http/TCP NodePort: http 32661/TCP Endpoints: 10.128.2.28:80,10.131.0.123:80 Port: https 443/TCP TargetPort: https/TCP NodePort: https 32584/TCP Endpoints: 10.128.2.28:443,10.131.0.123:443 Session Affinity: None External Traffic Policy: Cluster <<<<<<<<< Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal EnsuringLoadBalancer 2m45s (x2 over 10m) service-controller Ensuring load balancer Normal ExternalTrafficPolicy 2m45s service-controller Local -> Cluster Normal EnsuredLoadBalancer 2m44s (x2 over 9m19s) service-controller Ensured load balancer ## 8. The curl starts working 100% of the times
Removing the Triaged keyword because: * the QE automation assessment (flag qe_test_coverage) is missing
How should this be solved? Is it expected that with ETP=local the LB created in Octavia will only include nodes with the Service pods? How can that be known by the cloud provider? Should it analyze the Service endpoints and check where the pods are placed? Normally we could use health monitors to solve this (members on nodes without pods would just be marked as down), but ovn-octavia-provider doesn't support them.
Okay, I think that AWS and GCP cloud-providers solve this using health checks to make sure traffic is not directed to the nodes that will not answer. It's a pickle to solve this for ovn-octavia-provider as health monitors are not supported there yet. As an alternative we could attempt to only add these nodes that are hosting the Service pods to the LB, but that would require us to watch Pods, so it's not ideal as it's not really the model cloud provider interfaces are designed for. At this moment I believe we should document that OVN LBs + ETP=Local won't work. Then in the cloud-provider we can attempt to implement an option to force Amphora for any Service that has ETP=Local.
(In reply to Michał Dulko from comment #22) > Okay, I think that AWS and GCP cloud-providers solve this using health > checks to make sure traffic is not directed to the nodes that will not > answer. It's a pickle to solve this for ovn-octavia-provider as health > monitors are not supported there yet. As an alternative we could attempt to > only add these nodes that are hosting the Service pods to the LB, but that > would require us to watch Pods, so it's not ideal as it's not really the > model cloud provider interfaces are designed for. > > At this moment I believe we should document that OVN LBs + ETP=Local won't > work. Then in the cloud-provider we can attempt to implement an option to > force Amphora for any Service that has ETP=Local. this statement is not accurate we are able run with metallb LB with ETP local w/o any issues, again this is limitation with ovn octavia LB, and if you wanted to doc you that that is fine by me. for ETP local to work LB should aim traffic to nodes that only have endpoint(s) that is the basic requirement for LB.
(In reply to Mohamed Mahmoud from comment #24) > (In reply to Michał Dulko from comment #22) > > Okay, I think that AWS and GCP cloud-providers solve this using health > > checks to make sure traffic is not directed to the nodes that will not > > answer. It's a pickle to solve this for ovn-octavia-provider as health > > monitors are not supported there yet. As an alternative we could attempt to > > only add these nodes that are hosting the Service pods to the LB, but that > > would require us to watch Pods, so it's not ideal as it's not really the > > model cloud provider interfaces are designed for. > > > > At this moment I believe we should document that OVN LBs + ETP=Local won't > > work. Then in the cloud-provider we can attempt to implement an option to > > force Amphora for any Service that has ETP=Local. > > this statement is not accurate we are able run with metallb LB with ETP > local w/o any issues, again this is limitation with ovn octavia LB, and if > you wanted to doc you that that is fine by me. > for ETP local to work LB should aim traffic to nodes that only have > endpoint(s) that is the basic requirement for LB. Yeah, by "OVN LB" I meant LBs backed by Octavia and it's octavia-ovn-provider. It's quite confusing, I get that. ;)
BTW, I realized today that upstream cloud provider openstack has a note about this on the `create-monitor` option [1]: ``` create-monitor Indicates whether or not to create a health monitor for the service load balancer. A health monitor required for services that declare externalTrafficPolicy: Local. Default: false ``` The LB annotation documentation provides a bit more details [2]: ``` The health monitor can be created or deleted dynamically. A health monitor is required for services with externalTrafficPolicy: Local. Not supported when lb-provider=ovn is configured in openstack-cloud-controller-manager. ``` If we were to set `create-monitor=true` for the in-tree cloud provider, we would also have to set the `monitor-delay`, `monitor-timeout`, and `monitor-max-retries` as well as they do not get default values there [3]. We would also have to force the amphora LB provider with `lb-provider=amphora`. We should fix this in the docs. [1] https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/openstack-cloud-controller-manager/using-openstack-cloud-controller-manager.md#load-balancer [2] https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/openstack-cloud-controller-manager/expose-applications-using-loadbalancer-type-service.md#service-annotations [3] https://kubernetes-docsy-staging.netlify.app/docs/concepts/cluster-administration/cloud-providers/#load-balancer
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399