Description of problem: HAproxy is not generating the config correctly if the "selector" was removed from the service and old Endpoints were remained. Version-Release number of selected component (if applicable): Red Hat OpenShift Container Platform (RHOCP) 4.6 How reproducible: 100% Steps to Reproduce: 1. Prepare the test pod and service as follows. $ oc new-project test $ oc new-app httpd -n test $ oc get pod -o wide -n test $ oc get pod -o wide -n test NAME READY STATUS RESTARTS AGE IP httpd-7c7ccfffdc-wdkvk 1/1 Running 0 66s 10.128.2.8 <--- Pod IP $ oc describe svc httpd -n test Name: httpd Namespace: test : Selector: deployment=httpd <--- Above pod can trace this label if it is mached or not. Type: ClusterIP IP: 172.30.178.250 Port: 8080-tcp 8080/TCP TargetPort: 8080/TCP Endpoints: 10.128.2.8:8080 <--- You can see the Endpoints are matched with the pod IP. Port: 8443-tcp 8443/TCP TargetPort: 8443/TCP Endpoints: 10.128.2.8:8443 <--- You can see the Endpoints are matched with the pod IP. 2. Remove the "selector" field. $ oc replace -f - <<EOF apiVersion: v1 kind: Service metadata: labels: app: httpd app.kubernetes.io/component: httpd app.kubernetes.io/instance: httpd name: httpd namespace: test spec: clusterIP: 172.30.178.250 ports: - name: 8080-tcp port: 8080 protocol: TCP targetPort: 8080 - name: 8443-tcp port: 8443 protocol: TCP targetPort: 8443 sessionAffinity: None type: ClusterIP EOF $ oc describe svc httpd -n test Name: httpd Namespace: test : Selector: <none> <--- removed the "selector" value, it causes not syncing the Endpoints IPs. Type: ClusterIP IP: 172.30.178.250 Port: 8080-tcp 8080/TCP TargetPort: 8080/TCP Endpoints: 10.128.2.8:8080 <--- Old Endpoints are remained Port: 8443-tcp 8443/TCP TargetPort: 8443/TCP Endpoints: 10.128.2.8:8443 <--- Old Endpoints are remained 3. Check the Endpoints after restarting the test pod. You can see the existing service does not sync any more due to lost "selector", it's a expected behavior. $ oc delete pod httpd-7c7ccfffdc-wdkvk -n test $ oc get pod -o wide -n test $ oc get pod -o wide -n test NAME READY STATUS RESTARTS AGE IP httpd-7c7ccfffdc-hd2dj 1/1 Running 0 19s 10.128.2.9 <--- Pod IP is changed after restarting. $ oc describe svc httpd -n test Name: httpd Namespace: test : Selector: <none> <--- removed the "selector" value, it causes not syncing the Endpoints IPs. Type: ClusterIP IP: 172.30.178.250 Port: 8080-tcp 8080/TCP TargetPort: 8080/TCP Endpoints: 10.128.2.8:8080 <--- Old Endpoints are remained Port: 8443-tcp 8443/TCP TargetPort: 8443/TCP Endpoints: 10.128.2.8:8443 <--- Old Endpoints are remained 4. Expose the service, then the issue would be reproduced. $ oc expose svc httpd -n test $ oc logs -n openshift-ingress deploy/router-default : E0518 06:47:25.288227 1 limiter.go:165] error reloading router: exit status 1 [ALERT] 137/064725 (221) : parsing [/var/lib/haproxy/conf/haproxy.config:327] : backend 'be_http:test:httpd', another server named 'pod:httpd-7c7ccfffdc-wdkvk:httpd:8080-tcp:10.128.2.8:8080' was already defined at line 326, please use distinct names. [ALERT] 137/064725 (221) : Fatal errors found in configuration. Actual results: E0518 06:47:25.288227 1 limiter.go:165] error reloading router: exit status 1 [ALERT] 137/064725 (221) : parsing [/var/lib/haproxy/conf/haproxy.config:327] : backend 'be_http:test:httpd', another server named 'pod:httpd-7c7ccfffdc-wdkvk:httpd:8080-tcp:10.128.2.8:8080' was already defined at line 326, please use distinct names. [ALERT] 137/064725 (221) : Fatal errors found in configuration. Expected results: Additional info:
(In reply to Jobin A T from comment #0) > HAproxy is not generating the config correctly if the "selector" was removed > from the service and old Endpoints were remained. In OCP 4.6, the router observes EndpointSlices resources instead of Endpoints, by the way. > Version-Release number of selected component (if applicable): > Red Hat OpenShift Container Platform (RHOCP) 4.6 Was the customer not observing this issue on OCP 4.5? What z version of 4.6.z is the customer using specifically? > Expected results: What exactly is the outcome expected by the customer in this situation? When you remove the label-selector from a service, you are essentially putting the service in an unmanaged state, so updating the relevant endpoint/endpointslice resources to avoid HAProxy backend collisions would be your responsibility (instead of the endpoint/endpointslice mirroring controller's responsibility). It's not clear what we can do to resolve this issue, since removing the service selector is edging into unsupported territory. Does the customer have different expectations for how the router should behave in this situation? Tagging need-info as we try to boil this BZ down to a solvable issue (if that's possible). Leaving in NEW state until then.
Hi team, EndpointSlice can be duplicated, look the output [1], when the selector is removed. In other words, the same Endpoints ID can be available, the same server records would be added like debug logs [2]. And it makes router pods be failed to restart next time as follows. [ALERT] 138/033357 (18) : parsing [/var/lib/haproxy/conf/haproxy.config:327] : backend 'be_http:test:httpd', another server named 'pod:httpd-7c7ccfffdc-fc294:httpd:8080-tcp:10.128.2.8:8080' was already defined at line 326, please use distinct names. [ALERT] 138/033357 (18) : Fatal errors found in configuration. I0519 03:34:30.934144 1 template.go:690] router "msg"="Shutdown requested, waiting 45s for new connections to cease" As of the latest k8s(1.21, I'm not sure if the upstream backport would be applied to the OCPv4), endpointslice controller fix[0] may suppress this issue. [0] Updating EndpointSlice controllers to avoid duplicate creations - https://github.com/kubernetes/kubernetes/pull/100103 I think regardless of this issue, for stable up and running of the router, it had better check if the endpoints ID is duplicated before adding the server records to the "haproxy.config". [1] As soon as a "selector" field removes, the duplicated endpointslice generated. But each EndpointSlice is not the same trigger to generate, such as original one is generated by the Service, other one is generated by the Endpoints(for manual Endpoints managements). $ oc get endpointslice,endpoints -o wide NAME ADDRESSTYPE PORTS ENDPOINTS AGE endpointslice.discovery.k8s.io/httpd-5sg47 IPv4 8443,8080 10.128.2.8 19m <--- OwnerRef is the Endpoints. After removing "selector", this one generated. endpointslice.discovery.k8s.io/httpd-qr7hh IPv4 8443,8080 10.128.2.8 17h <--- OwnerRef is the Service. NAME ENDPOINTS AGE endpoints/httpd 10.128.2.8:8080,10.128.2.8:8443 17h [2] The duplicated endpointslices were added, look the "processing subset" logs,it causes the parsing error. I0519 01:09:57.252744 7 plugin.go:178] template "msg"="processing endpoints" "endpointCount"=2 "eventType"="MODIFIED" "name"="httpd" "namespace"="test" I0519 01:09:57.252802 7 plugin.go:181] template "msg"="processing subset" "index"=0 "subset"={"addresses":[{"ip":"10.128.2.8","targetRef":{"kind":"Pod","namespace":"test","name":"httpd-7c7ccfffdc-fc294","uid":"4f918dc5-d020-44c9-ba7c-6e87009f33f0","resourceVersion":"105002"}}],"ports":[{"name":"8080-tcp","port":8080,"protocol":"TCP"},{"name":"8443-tcp","port":8443,"protocol":"TCP"}]} I0519 01:09:57.252826 7 plugin.go:181] template "msg"="processing subset" "index"=1 "subset"={"addresses":[{"ip":"10.128.2.8","targetRef":{"kind":"Pod","namespace":"test","name":"httpd-7c7ccfffdc-fc294","uid":"4f918dc5-d020-44c9-ba7c-6e87009f33f0","resourceVersion":"105002"}}],"ports":[{"name":"8080-tcp","port":8080,"protocol":"TCP"},{"name":"8443-tcp","port":8443,"protocol":"TCP"}]} I0519 01:09:57.252846 7 plugin.go:190] template "msg"="modifying endpoints" "key"="test/httpd" I0519 01:09:57.252923 7 router.go:445] template "msg"="writing the router config" I0519 01:09:57.252984 7 router.go:499] template "msg"="committing router certificate manager changes..." I0519 01:09:57.252998 7 router.go:504] template "msg"="router certificate manager config committed" I0519 01:09:57.257998 7 router.go:455] template "msg"="calling reload function" "fn"=0 I0519 01:09:57.259462 7 router.go:459] template "msg"="reloading the router" E0519 01:09:57.277337 7 limiter.go:165] error reloading router: exit status 1 [ALERT] 138/010957 (80) : parsing [/var/lib/haproxy/conf/haproxy.config:327] : backend 'be_http:test:httpd', another server named 'pod:httpd-7c7ccfffdc-fc294:httpd:8080-tcp:10.128.2.8:8080' was already defined at line 326, please use distinct names.
FYI, I've also opened PR here: https://github.com/openshift/router/pull/285
(In reply to Daein Park from comment #3) > FYI, I've also opened PR here: https://github.com/openshift/router/pull/285 Thanks for sharing a potential patch! I will check with my team and see if this is the approach we want to take. In the meantime, could you include the full endpointslice yaml for the 2 endpointslices you mention in Comment 2 (this will help me better understand the issue)? Can you safely delete the endpointslice that has the service as it's owner ref? Is that at least a workaround here? Again, deleting the service selector from a clusterIP service does not seem like a supported/typical flow, but I suppose we can better handle the outcome of doing so. Also, note that https://github.com/kubernetes/kubernetes/pull/100103 should be included in OCP 4.8. If I get around to it, I will try to reproduce this issue on 4.6 and see if I can also reproduce this issue on 4.8 latest. It's not clear to me how that upstream patch would resolve the issue, but I agree that it could perhaps make a difference.
@Stephen Greene Thank you for your prompt update. > In the meantime, could you include the full endpointslice yaml for the 2 endpointslices you mention in Comment 2 (this will help me better understand the issue)? Please refer [0]. > Can you safely delete the endpointslice that has the service as it's owner ref? Is that at least a workaround here? Yes, you're right. Workaournd is to remove invalid/not managed another one. For more details, it's different depending on "selector" field configuration states in the existing service. 1. If you removed the "selector" in the service, only endpointslice which has a Endpoints OwnerRef would be generated after removing all endpointslice automatically. Refer [1] for my testing result. 2. If you set the "selector" in the service again, only endpointslice which has a Service OwnerRef would be generated after removing all endpointslice automatically. Refer [2] for my testing result. > Again, deleting the service selector from a clusterIP service does not seem like a supported/typical flow, but I suppose we can better handle the outcome of doing so. I also agree with your thought. It's not usual, and to remove "selector" means user should manage their own endpoints by himself. But, this process to enable self-managed endpoints may affect running router like this issue, it's unexpected result for users. If possible, we would better enhance the process in the router or providing kind documentations for suppressing unexpected issue. > Also, note that https://github.com/kubernetes/kubernetes/pull/100103 should be included in OCP 4.8. If I get around to it, I will try to reproduce this issue on 4.6 and see if I can also reproduce this issue on 4.8 latest. It's not clear to me how that upstream patch would resolve the issue, but I agree that it could perhaps make a difference. As you mentioned, it may not be related with this issue. It looks like only tracing the same endpointslice generations... But I think it worth testing, because the fix can change the current endpointslice creation behavior. [0] endpointslices $ oc get endpointslices NAME ADDRESSTYPE PORTS ENDPOINTS AGE httpd-4hf9v IPv4 8443,8080 10.129.2.16 109m httpd-9bw7j IPv4 8443,8080 10.129.2.16 6s <--- Generated after removing "selector" field in the service. $ oc get endpointslices -o yaml apiVersion: v1 items: - addressType: IPv4 apiVersion: discovery.k8s.io/v1beta1 endpoints: - addresses: - 10.129.2.16 conditions: ready: true targetRef: kind: Pod name: httpd-7c7ccfffdc-mqps8 namespace: test resourceVersion: "242425" uid: 671865b5-f7e9-41fd-911e-cf53b8fc6e2c topology: kubernetes.io/hostname: ip-10-0-186-33.ap-northeast-1.compute.internal topology.kubernetes.io/region: ap-northeast-1 topology.kubernetes.io/zone: ap-northeast-1a kind: EndpointSlice metadata: annotations: endpoints.kubernetes.io/last-change-trigger-time: "2021-05-18T08:03:23Z" creationTimestamp: "2021-05-19T12:45:39Z" generateName: httpd- generation: 2 labels: endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io kubernetes.io/service-name: httpd manager: kube-controller-manager operation: Update time: "2021-05-19T14:30:38Z" name: httpd-4hf9v namespace: test ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: Service name: httpd uid: 07f8582e-969b-4f52-bd74-f73a56a8bfae resourceVersion: "243485" selfLink: /apis/discovery.k8s.io/v1beta1/namespaces/test/endpointslices/httpd-4hf9v uid: f8b63b4a-fd61-452b-b3b4-0c5e75e6d84a ports: - name: 8443-tcp port: 8443 protocol: TCP - name: 8080-tcp port: 8080 protocol: TCP - addressType: IPv4 apiVersion: discovery.k8s.io/v1beta1 endpoints: - addresses: - 10.129.2.16 conditions: ready: true targetRef: kind: Pod name: httpd-7c7ccfffdc-mqps8 namespace: test resourceVersion: "242425" uid: 671865b5-f7e9-41fd-911e-cf53b8fc6e2c topology: kubernetes.io/hostname: ip-10-0-186-33.ap-northeast-1.compute.internal kind: EndpointSlice metadata: creationTimestamp: "2021-05-19T14:34:34Z" generateName: httpd- generation: 1 labels: app: httpd app.kubernetes.io/component: httpd app.kubernetes.io/instance: httpd endpointslice.kubernetes.io/managed-by: endpointslicemirroring-controller.k8s.io kubernetes.io/service-name: httpd manager: kube-controller-manager operation: Update time: "2021-05-19T14:34:34Z" name: httpd-9bw7j namespace: test ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: Endpoints name: httpd uid: 0291165b-95cf-403c-8f06-2e39c1ab81fb resourceVersion: "245803" selfLink: /apis/discovery.k8s.io/v1beta1/namespaces/test/endpointslices/httpd-9bw7j uid: 02c9837a-7568-4d91-9dd0-948c1cc4a07c ports: - name: 8443-tcp port: 8443 protocol: TCP - name: 8080-tcp port: 8080 protocol: TCP kind: List metadata: resourceVersion: "" selfLink: "" [1] $ oc delete endpointslice httpd-4hf9v httpd-9bw7j endpointslice.discovery.k8s.io "httpd-4hf9v" deleted endpointslice.discovery.k8s.io "httpd-9bw7j" deleted $ oc get endpointslice NAME ADDRESSTYPE PORTS ENDPOINTS AGE httpd-qw8k4 IPv4 8443,8080 10.129.2.16 2s $ oc get endpointslice httpd-qw8k4 -o yaml addressType: IPv4 apiVersion: discovery.k8s.io/v1beta1 endpoints: - addresses: - 10.129.2.16 conditions: ready: true targetRef: kind: Pod name: httpd-7c7ccfffdc-mqps8 namespace: test resourceVersion: "242425" uid: 671865b5-f7e9-41fd-911e-cf53b8fc6e2c topology: kubernetes.io/hostname: ip-10-0-186-33.ap-northeast-1.compute.internal kind: EndpointSlice metadata: creationTimestamp: "2021-05-19T14:39:35Z" generateName: httpd- generation: 1 labels: app: httpd app.kubernetes.io/component: httpd app.kubernetes.io/instance: httpd endpointslice.kubernetes.io/managed-by: endpointslicemirroring-controller.k8s.io kubernetes.io/service-name: httpd manager: kube-controller-manager operation: Update time: "2021-05-19T14:39:35Z" name: httpd-qw8k4 namespace: test ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: Endpoints <--- Endpoints ownerRef name: httpd uid: 0291165b-95cf-403c-8f06-2e39c1ab81fb resourceVersion: "247199" selfLink: /apis/discovery.k8s.io/v1beta1/namespaces/test/endpointslices/httpd-qw8k4 uid: 9f0ba243-2f3e-4333-b24f-cba8a5491dd5 ports: - name: 8443-tcp port: 8443 protocol: TCP - name: 8080-tcp port: 8080 protocol: TCP $ oc get endpoints NAME ENDPOINTS AGE httpd 10.129.2.16:8080,10.129.2.16:8443 30h [2] $ oc get endpointslice NAME ADDRESSTYPE PORTS ENDPOINTS AGE httpd-mk9gg IPv4 8443,8080 10.129.2.16 9s httpd-qw8k4 IPv4 8443,8080 10.129.2.16 10m $ oc delete endpointslice httpd-mk9gg httpd-qw8k4 endpointslice.discovery.k8s.io "httpd-mk9gg" deleted endpointslice.discovery.k8s.io "httpd-qw8k4" deleted $ oc get endpointslice NAME ADDRESSTYPE PORTS ENDPOINTS AGE httpd-vlw5r IPv4 8443,8080 10.129.2.16 2s $ oc get endpointslice httpd-vlw5r -o yaml addressType: IPv4 apiVersion: discovery.k8s.io/v1beta1 endpoints: - addresses: - 10.129.2.16 conditions: ready: true targetRef: kind: Pod name: httpd-7c7ccfffdc-mqps8 namespace: test resourceVersion: "242425" uid: 671865b5-f7e9-41fd-911e-cf53b8fc6e2c topology: kubernetes.io/hostname: ip-10-0-186-33.ap-northeast-1.compute.internal topology.kubernetes.io/region: ap-northeast-1 topology.kubernetes.io/zone: ap-northeast-1a kind: EndpointSlice metadata: creationTimestamp: "2021-05-19T14:50:38Z" generateName: httpd- generation: 1 labels: endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io kubernetes.io/service-name: httpd manager: kube-controller-manager operation: Update time: "2021-05-19T14:50:38Z" name: httpd-vlw5r namespace: test ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: Service <--- Service ownerRef name: httpd uid: 07f8582e-969b-4f52-bd74-f73a56a8bfae resourceVersion: "250314" selfLink: /apis/discovery.k8s.io/v1beta1/namespaces/test/endpointslices/httpd-vlw5r uid: e124efcc-a57f-439f-8d31-3ddf5ed4c6f0 ports: - name: 8443-tcp port: 8443 protocol: TCP - name: 8080-tcp port: 8080 protocol: TCP $ oc get endpoints NAME ENDPOINTS AGE httpd 10.129.2.16:8080,10.129.2.16:8443 30h
Thanks for sharing your test cases in detail. I was able to reproduce this issue on 4.8.0-0.ci-2021-05-19-081203 using the following trivial reproducer: oc new-project test oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json oc expose pod/hello-openshift oc expose service/hello-openshift oc patch service hello-openshift --patch '{"spec":{"selector":null}}' observe reload failure in router pod logs > I also agree with your thought. It's not usual, and to remove "selector" means user should manage their own endpoints by himself. But, this process to enable self-managed endpoints may affect running router like this issue, it's unexpected result for users. > If possible, we would better enhance the process in the router or providing kind documentations for suppressing unexpected issue. I agree. I will work with my team to decide the best path forward here and get back to you soon. Thank you for your patience as we work through higher priority issues. Note that right now in OCP, the customer could take advantage of the following metric: `template_router_reload_failure` should return `1` if any router pods are currently in a "wedged" state, meaning the router pod cannot reload into the newest HAProxy configuration. This metric is leveraged by the `HAProxyReloadFail` alert, which should fire if the `template_router_reload_failure` metric is returning `1` for at least 5 minutes. So, if the customer is concerned about other edge cases causing issues like this in the future, than can simply monitor these metrics/alerts. Thanks!
> I was able to reproduce this issue on 4.8.0-0.ci-2021-05-19-081203 using the following trivial reproducer: Thank you for your testing. Unfortunately, the upstream fix was not helpful to fix on this issue. > This metric is leveraged by the `HAProxyReloadFail` alert, which should fire if the `template_router_reload_failure` metric is returning `1` for at least 5 minutes. > So, if the customer is concerned about other edge cases causing issues like this in the future, than can simply monitor these metrics/alerts. Great suggestion ! It would be helpful for better management of the router configuration ! Thank you again for your information and work !
spoke with my team and we have decided that it would be wise to merge something along the lines of https://github.com/openshift/router/pull/285. When I have time I will review the proposed patch and see if there's any performance implications associated with it (or if there's a better way to implement the fix in general). Thanks!
Verified in "4.8.0-0.nightly-2021-05-21-233425" payload. With this release, there are no more router reload errors when the selector is removed for a service mapped to a route: ----- oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-05-21-233425 True False 4h16m Cluster version is 4.8.0-0.nightly-2021-05-21-233425 oc get all NAME READY STATUS RESTARTS AGE pod/hello-openshift 1/1 Running 0 93m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/hello-openshift ClusterIP 172.30.92.91 <none> 8080/TCP 93m NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD route.route.openshift.io/hello-openshift hello-openshift-test1.apps.aiyengar4824.qe.devcluster.openshift.com hello-openshift 8080 None apiVersion: v1 kind: Service metadata: creationTimestamp: "2021-05-24T08:53:50Z" labels: name: hello-openshift name: hello-openshift namespace: test1 resourceVersion: "97341" uid: 0a4c84e9-af89-46a4-86bd-787a9aaeebb3 spec: clusterIP: 172.30.92.91 clusterIPs: - 172.30.92.91 ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - port: 8080 protocol: TCP targetPort: 8080 sessionAffinity: None type: ClusterIP status: loadBalancer: {} oc -n openshift-ingress logs router-default-56b4fbb5ff-f4nrd --tail 50 I0524 05:47:04.382092 1 template.go:433] router "msg"="starting router" "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: c7b3985da3d1341fdac33f4d6bb6994fe29d32b7\nversionFromGit: 4.0.0-299-gc7b3985d\ngitTreeState: clean\nbuildDate: 2021-05-21T18:48:49Z\n" I0524 05:47:04.383766 1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS" "address"="0.0.0.0:1936" I0524 05:47:04.388599 1 router.go:191] template "msg"="creating a new template router" "writeDir"="/var/lib/haproxy" I0524 05:47:04.388679 1 router.go:270] template "msg"="router will coalesce reloads within an interval of each other" "interval"="5s" I0524 05:47:04.389386 1 router.go:332] template "msg"="watching for changes" "path"="/etc/pki/tls/private" I0524 05:47:04.389442 1 router.go:262] router "msg"="router is including routes in all namespaces" E0524 05:47:04.494692 1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory I0524 05:47:04.530597 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 05:47:09.521271 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 05:47:35.597216 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 05:47:40.569203 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 05:47:45.606777 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 05:54:19.025876 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 05:54:35.750379 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 05:55:09.203483 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 05:55:14.196201 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 06:07:52.951984 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 06:08:25.395947 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 06:08:30.377415 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" I0524 06:08:55.400129 1 router.go:579] template "msg"="router reloaded" "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n" ------
xref https://github.com/kubernetes/kubernetes/issues/103576
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
The customer has the same error in OCP4.7.13 environment. The fix commit has been added on 2021-5-22. https://github.com/openshift/router/commit/8e5e70b4164d4fc2f2515d431892f2b1c803f0ed And I confirmed that openshift4/ose-haproxy-router 4.7.16(buildDate: 2021-06-03T23:22:11Z) or later has been fixed this issue. OpenShift Container Platform 4.7.16 container image list https://access.redhat.com/solutions/6115681 # podman run -it --entrypoint=/usr/bin/openshift-router openshift4/ose-haproxy-router:v4.7.0-202106032231.p0.git.5a0e656 version openshift-router majorFromGit: minorFromGit: commitFromGit: 5a0e6561b0480df9f32a8ef87a54a1dc4cf91b93 versionFromGit: 4.0.0-268-g5a0e6561 gitTreeState: clean buildDate: 2021-06-03T23:22:11Z