So this bug isn't specific to AWS. I think I know what's the problem here. In OVNK we haven't implemented healthcheck for `UpdateService` logic. So when traffic policy changes from cluster to local, the external load balancers, aws/gcp aren't able to have successful healthchecks towards the healthCheck NodePort service. Hence the curl fails. Reproduced this on GCP and AWS and learnt something new about health checks and etp=local from https://kubernetes.io/docs/tutorials/services/source-ip/#source-ip-for-services-with-type-loadbalancer and https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip. [surya@hidden-temple yaml_debugging]$ oc describe svc -n surya Name: hello-world-2 Namespace: surya Labels: <none> Annotations: <none> Selector: run=load-balancer-example-2 Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.63.245 IPs: 172.30.63.245 LoadBalancer Ingress: 34.zzz.yy.xxx Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 31854/TCP Endpoints: 10.131.0.19:8080 Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal EnsuringLoadBalancer 112s service-controller Ensuring load balancer Normal EnsuredLoadBalancer 74s service-controller Ensured load balancer [surya@hidden-temple yaml_debugging]$ curl 34.zzz.yy.xxx:80 Hello Kubernetes! [surya@hidden-temple yaml_debugging]$ kubectl patch svc -n surya hello-world-2 -p '{"spec":{"externalTrafficPolicy":"Local"}}' service/hello-world-2 patched [surya@hidden-temple yaml_debugging]$ oc describe svc -n surya Name: hello-world-2 Namespace: surya Labels: <none> Annotations: <none> Selector: run=load-balancer-example-2 Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.63.245 IPs: 172.30.63.245 LoadBalancer Ingress: 34.zzz.yy.xxx Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 31854/TCP Endpoints: 10.131.0.19:8080 Session Affinity: None External Traffic Policy: Local HealthCheck NodePort: 32312 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal EnsuredLoadBalancer 97s service-controller Ensured load balancer Normal EnsuringLoadBalancer 2s (x2 over 2m15s) service-controller Ensuring load balancer Normal ExternalTrafficPolicy 2s service-controller Cluster -> Local [surya@hidden-temple yaml_debugging]$ curl 34.zzz.yy.xxx:80 curl: (28) Failed to connect to 34.zzz.yy.xxx port 80 after 128656 ms: Connection timed out
Maybe I should add an upstream e2e about updating the ETP value. That might help catch this in CNI plugins if they aren't implemented.
https://github.com/ovn-org/ovn-kubernetes/pull/3164 posted. Tested on AWS: [surya@hidden-temple yaml_debugging]$ oc describe svc -n surya2 Name: hello-world-2 Namespace: surya2 Labels: <none> Annotations: <none> Selector: run=load-balancer-example-2 Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.126.107 IPs: 172.30.126.107 LoadBalancer Ingress: blah.us-east-2.elb.amazonaws.com Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 32480/TCP Endpoints: 10.128.2.4:8080 Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal EnsuringLoadBalancer 51s service-controller Ensuring load balancer Normal EnsuredLoadBalancer 47s service-controller Ensured load balancer [surya@hidden-temple yaml_debugging]$ kubectl patch svc -n surya2 hello-world-2 -p '{"spec":{"externalTrafficPolicy":"Local"}}' service/hello-world-2 patched surya@hidden-temple yaml_debugging]$ oc describe svc -n surya2 Name: hello-world-2 Namespace: surya2 Labels: <none> Annotations: <none> Selector: run=load-balancer-example-2 Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.126.107 IPs: 172.30.126.107 LoadBalancer Ingress: blah.us-east-2.elb.amazonaws.com Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 32480/TCP Endpoints: 10.128.2.4:8080 Session Affinity: None External Traffic Policy: Local HealthCheck NodePort: 32161 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal EnsuringLoadBalancer 27s (x2 over 93s) service-controller Ensuring load balancer Normal ExternalTrafficPolicy 27s service-controller Cluster -> Local Normal EnsuredLoadBalancer 26s (x2 over 89s) service-controller Ensured load balancer $ oc rsh -c ovnkube-node-8wrpr -n openshift-ovn-kubernetes error: pod, type/name or --filename must be specified [surya@hidden-temple ovn-kubernetes]$ oc rsh -c ovnkube-node -n openshift-ovn-kubernetes ovnkube-node-8wrpr sh-4.4# curl localhost:32161/healthz { "service": { "namespace": "surya2", "name": "hello-world-2" }, "localEndpoints": 1 } [surya@hidden-temple yaml_debugging]$ kubectl patch svc -n surya2 hello-world-2 -p '{"spec":{"externalTrafficPolicy":"Cluster"}}' service/hello-world-2 patched [surya@hidden-temple yaml_debugging]$ oc describe svc -n surya2 Name: hello-world-2 Namespace: surya2 Labels: <none> Annotations: <none> Selector: run=load-balancer-example-2 Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.126.107 IPs: 172.30.126.107 LoadBalancer Ingress: blah.us-east-2.elb.amazonaws.com Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 32480/TCP Endpoints: 10.128.2.4:8080 Session Affinity: None External Traffic Policy: Cluster Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalTrafficPolicy 3m36s service-controller Cluster -> Local Normal EnsuringLoadBalancer 7s (x3 over 4m42s) service-controller Ensuring load balancer Normal ExternalTrafficPolicy 7s service-controller Local -> Cluster Normal EnsuredLoadBalancer 6s (x3 over 4m38s) service-controller Ensured load balancer sh-4.4# curl localhost:32161/healthz curl: (7) Failed to connect to localhost port 32161: Connection refused sh-4.4# $ curl blah.us-east-2.elb.amazonaws.com:80 Hello Kubernetes! $ kubectl patch svc -n surya2 hello-world-2 -p '{"spec":{"externalTrafficPolicy":"Local"}}' service/hello-world-2 patched [surya@hidden-temple yaml_debugging]$ oc describe svc -n surya2 Name: hello-world-2 Namespace: surya2 Labels: <none> Annotations: <none> Selector: run=load-balancer-example-2 Type: LoadBalancer IP Family Policy: SingleStack IP Families: IPv4 IP: 172.30.126.107 IPs: 172.30.126.107 LoadBalancer Ingress: blah.us-east-2.elb.amazonaws.com Port: <unset> 80/TCP TargetPort: 8080/TCP NodePort: <unset> 32480/TCP Endpoints: 10.128.2.4:8080 Session Affinity: None External Traffic Policy: Local HealthCheck NodePort: 31983 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalTrafficPolicy 3m50s service-controller Local -> Cluster Normal EnsuringLoadBalancer 6s (x4 over 8m25s) service-controller Ensuring load balancer Normal ExternalTrafficPolicy 6s (x2 over 7m19s) service-controller Cluster -> Local Normal EnsuredLoadBalancer 5s (x4 over 8m21s) service-controller Ensured load balancer sh-4.4# curl localhost:31983/healthz { "service": { "namespace": "surya2", "name": "hello-world-2" }, "localEndpoints": 1 } [surya@hidden-temple yaml_debugging]$ curl blah.us-east-2.elb.amazonaws.com:80 Hello Kubernetes!
Downstream merge to bring in the upstream commit is open: https://github.com/openshift/ovn-kubernetes/pull/1275/commits/4c7add6a642689b7429babf1761c13942c3a9577
Merged: https://github.com/openshift/ovn-kubernetes/pull/1289/commits/4c7add6a642689b7429babf1761c13942c3a9577 Since its bulk merge manually moving state to MODIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399