Bug 1961550 - HAproxy pod logs showing error "another server named 'pod:httpd-7c7ccfffdc-wdkvk:httpd:8080-tcp:10.128.x.x:8080' was already defined at line 326, please use distinct names"
Reported: 2021-05-18 09:00 UTC by Jobin A T
Modified: 2023-03-22 09:35 UTC
9 users (show)

Cause: Remove selector from a service exposed via a route. Consequence: Duplicate endpointslices would be created for the service's pods, triggering HAProxy reload errors due to duplicate server entries. Fix: Filter out accidental duplicate server lines when writing out the HAProxy config file. Result: Deleting the selector from a service does not brick the router.
Description of problem:
HAproxy is not generating the config correctly if the "selector" was removed from the service and old Endpoints were remained.

Red Hat OpenShift Container Platform (RHOCP) 4.6

1. Prepare the test pod and service as follows.
  $ oc new-project test
  $ oc new-app httpd -n test
  $ oc get pod -o wide -n test
  $ oc get pod -o wide -n test
  NAME                     READY   STATUS    RESTARTS   AGE   IP           
  httpd-7c7ccfffdc-wdkvk   1/1     Running   0          66s   <--- Pod IP

  $ oc describe svc httpd -n test
  Name:              httpd
  Namespace:         test
  Selector:          deployment=httpd   <--- Above pod can trace this label if it is mached or not.
  Type:              ClusterIP
  Port:              8080-tcp  8080/TCP
  TargetPort:        8080/TCP
  Endpoints:    <--- You can see the Endpoints are matched with the pod IP.
  Port:              8443-tcp  8443/TCP
  TargetPort:        8443/TCP
  Endpoints:    <--- You can see the Endpoints are matched with the pod IP.

2. Remove the "selector" field.
  $ oc replace -f - <<EOF
  apiVersion: v1
  kind: Service
      app: httpd
      app.kubernetes.io/component: httpd
      app.kubernetes.io/instance: httpd
    name: httpd
    namespace: test
    - name: 8080-tcp
      port: 8080
      protocol: TCP
      targetPort: 8080
    - name: 8443-tcp
      port: 8443
      protocol: TCP
      targetPort: 8443
    sessionAffinity: None
    type: ClusterIP

  $ oc describe svc httpd -n test
  Name:              httpd
  Namespace:         test
  Selector:          <none>             <--- removed the "selector" value, it causes not syncing the Endpoints IPs.
  Type:              ClusterIP
  Port:              8080-tcp  8080/TCP
  TargetPort:        8080/TCP
  Endpoints:    <--- Old Endpoints are remained
  Port:              8443-tcp  8443/TCP
  TargetPort:        8443/TCP
  Endpoints:    <--- Old Endpoints are remained

3. Check the Endpoints after restarting the test pod.
   You can see the existing service does not sync any more due to lost "selector", it's a expected behavior.
  $ oc delete pod httpd-7c7ccfffdc-wdkvk -n test
  $ oc get pod -o wide -n test
  $ oc get pod -o wide -n test
  NAME                     READY   STATUS    RESTARTS   AGE   IP        
  httpd-7c7ccfffdc-hd2dj   1/1     Running   0          19s <--- Pod IP is changed after restarting.

  $ oc describe svc httpd -n test
  Name:              httpd
  Namespace:         test
  Selector:          <none>             <--- removed the "selector" value, it causes not syncing the Endpoints IPs.
  Type:              ClusterIP
  Port:              8080-tcp  8080/TCP
  TargetPort:        8080/TCP
  Endpoints:    <--- Old Endpoints are remained
  Port:              8443-tcp  8443/TCP
  TargetPort:        8443/TCP
  Endpoints:    <--- Old Endpoints are remained

4. Expose the service, then the issue would be reproduced.

  $ oc expose svc httpd -n test
  $ oc logs -n openshift-ingress deploy/router-default
  E0518 06:47:25.288227       1 limiter.go:165] error reloading router: exit status 1
  [ALERT] 137/064725 (221) : parsing [/var/lib/haproxy/conf/haproxy.config:327] : backend 'be_http:test:httpd', another server named 'pod:httpd-7c7ccfffdc-wdkvk:httpd:8080-tcp:' was already defined at line 326, please use distinct names.
  [ALERT] 137/064725 (221) : Fatal errors found in configuration.

  E0518 06:47:25.288227       1 limiter.go:165] error reloading router: exit status 1
  [ALERT] 137/064725 (221) : parsing [/var/lib/haproxy/conf/haproxy.config:327] : backend 'be_http:test:httpd', another server named 'pod:httpd-7c7ccfffdc-wdkvk:httpd:8080-tcp:' was already defined at line 326, please use distinct names.
  [ALERT] 137/064725 (221) : Fatal errors found in configuration.

Comment 1 Stephen Greene 2021-05-18 17:47:41 UTC
(In reply to Jobin A T from comment #0)
> HAproxy is not generating the config correctly if the "selector" was removed
> from the service and old Endpoints were remained.

In OCP 4.6, the router observes EndpointSlices resources instead of Endpoints, by the way.

> Version-Release number of selected component (if applicable):
> Red Hat OpenShift Container Platform (RHOCP) 4.6

Was the customer not observing this issue on OCP 4.5? What z version of 4.6.z is the customer using specifically?

> Expected results:

What exactly is the outcome expected by the customer in this situation?
When you remove the label-selector from a service, you are essentially putting the service in an unmanaged state, so updating the relevant endpoint/endpointslice resources to avoid HAProxy backend collisions would be your responsibility (instead of the endpoint/endpointslice mirroring controller's responsibility).

It's not clear what we can do to resolve this issue, since removing the service selector is edging into unsupported territory. Does the customer have different expectations for how the router should behave in this situation?

Tagging need-info as we try to boil this BZ down to a solvable issue (if that's possible). Leaving in NEW state until then.

Comment 2 Daein Park 2021-05-19 06:26:31 UTC
Hi team,

EndpointSlice can be duplicated, look the output [1], when the selector is removed.
In other words, the same Endpoints ID can be available, the same server records would be added like debug logs [2].
And it makes router pods be failed to restart next time as follows.

  [ALERT] 138/033357 (18) : parsing [/var/lib/haproxy/conf/haproxy.config:327] : backend 'be_http:test:httpd', another server named 'pod:httpd-7c7ccfffdc-fc294:httpd:8080-tcp:' was already defined at line 326, please use distinct names.
  [ALERT] 138/033357 (18) : Fatal errors found in configuration.
  I0519 03:34:30.934144       1 template.go:690] router "msg"="Shutdown requested, waiting 45s for new connections to cease"

As of the latest k8s(1.21, I'm not sure if the upstream backport would be applied to the OCPv4), endpointslice controller fix[0] may suppress this issue.

  [0] Updating EndpointSlice controllers to avoid duplicate creations
    - https://github.com/kubernetes/kubernetes/pull/100103

I think regardless of this issue, for stable up and running of the router, it had better check if the endpoints ID is duplicated before adding the server records to the "haproxy.config".

[1] As soon as a "selector" field removes, the duplicated endpointslice generated.
    But each EndpointSlice is not the same trigger to generate, 
    such as original one is generated by the Service, other one is generated by the Endpoints(for manual Endpoints managements).
$ oc get endpointslice,endpoints -o wide
NAME                                         ADDRESSTYPE   PORTS       ENDPOINTS    AGE
endpointslice.discovery.k8s.io/httpd-5sg47   IPv4          8443,8080   19m  <--- OwnerRef is the Endpoints. After removing "selector", this one generated.
endpointslice.discovery.k8s.io/httpd-qr7hh   IPv4          8443,8080   17h  <--- OwnerRef is the Service.

NAME              ENDPOINTS                         AGE
endpoints/httpd,   17h

[2] The duplicated endpointslices were added, look the "processing subset" logs,it causes the parsing error.
I0519 01:09:57.252744       7 plugin.go:178] template "msg"="processing endpoints"  "endpointCount"=2 "eventType"="MODIFIED" "name"="httpd" "namespace"="test"
I0519 01:09:57.252802       7 plugin.go:181] template "msg"="processing subset"  "index"=0 "subset"={"addresses":[{"ip":"","targetRef":{"kind":"Pod","namespace":"test","name":"httpd-7c7ccfffdc-fc294","uid":"4f918dc5-d020-44c9-ba7c-6e87009f33f0","resourceVersion":"105002"}}],"ports":[{"name":"8080-tcp","port":8080,"protocol":"TCP"},{"name":"8443-tcp","port":8443,"protocol":"TCP"}]}
I0519 01:09:57.252826       7 plugin.go:181] template "msg"="processing subset"  "index"=1 "subset"={"addresses":[{"ip":"","targetRef":{"kind":"Pod","namespace":"test","name":"httpd-7c7ccfffdc-fc294","uid":"4f918dc5-d020-44c9-ba7c-6e87009f33f0","resourceVersion":"105002"}}],"ports":[{"name":"8080-tcp","port":8080,"protocol":"TCP"},{"name":"8443-tcp","port":8443,"protocol":"TCP"}]}
I0519 01:09:57.252846       7 plugin.go:190] template "msg"="modifying endpoints"  "key"="test/httpd"
I0519 01:09:57.252923       7 router.go:445] template "msg"="writing the router config"  
I0519 01:09:57.252984       7 router.go:499] template "msg"="committing router certificate manager changes..."  
I0519 01:09:57.252998       7 router.go:504] template "msg"="router certificate manager config committed"  
I0519 01:09:57.257998       7 router.go:455] template "msg"="calling reload function"  "fn"=0
I0519 01:09:57.259462       7 router.go:459] template "msg"="reloading the router"  
E0519 01:09:57.277337       7 limiter.go:165] error reloading router: exit status 1
[ALERT] 138/010957 (80) : parsing [/var/lib/haproxy/conf/haproxy.config:327] : backend 'be_http:test:httpd', another server named 'pod:httpd-7c7ccfffdc-fc294:httpd:8080-tcp:' was already defined at line 326, please use distinct names.

Comment 3 Daein Park 2021-05-19 08:36:32 UTC
FYI, I've also opened PR here: https://github.com/openshift/router/pull/285

Comment 4 Stephen Greene 2021-05-19 13:52:16 UTC
(In reply to Daein Park from comment #3)
> FYI, I've also opened PR here: https://github.com/openshift/router/pull/285

Thanks for sharing a potential patch! I will check with my team and see if this is the approach we want to take.

In the meantime, could you include the full endpointslice yaml for the 2 endpointslices you mention in Comment 2 (this will help me better understand the issue)?
Can you safely delete the endpointslice that has the service as it's owner ref? Is that at least a workaround here?
Again, deleting the service selector from a clusterIP service does not seem like a supported/typical flow, but I suppose we can better handle the outcome of doing so.

Also, note that https://github.com/kubernetes/kubernetes/pull/100103 should be included in OCP 4.8. If I get around to it, I will try to reproduce this issue on 4.6 and see if I can also reproduce this issue on 4.8 latest. It's not clear to me how that upstream patch would resolve the issue, but I agree that it could perhaps make a difference.

Comment 5 Daein Park 2021-05-19 15:23:25 UTC
@Stephen Greene Thank you for your prompt update.

> In the meantime, could you include the full endpointslice yaml for the 2 endpointslices you mention in Comment 2 (this will help me better understand the issue)?

Please refer [0].

> Can you safely delete the endpointslice that has the service as it's owner ref? Is that at least a workaround here?

Yes, you're right. Workaournd is to remove invalid/not managed another one. 
For more details, it's different depending on "selector" field configuration states in the existing service.

  1. If you removed the "selector" in the service, only endpointslice which has a Endpoints OwnerRef would be generated after removing all endpointslice automatically.
     Refer [1] for my testing result.

  2. If you set the "selector" in the service again, only endpointslice which has a Service OwnerRef would be generated after removing all endpointslice automatically.
     Refer [2] for my testing result.

> Again, deleting the service selector from a clusterIP service does not seem like a supported/typical flow, but I suppose we can better handle the outcome of doing so.

I also agree with your thought. It's not usual, and to remove "selector" means user should manage their own endpoints by himself.
But, this process to enable self-managed endpoints may affect running router like this issue, it's unexpected result for users.
If possible, we would better enhance the process in the router or providing kind documentations for suppressing unexpected issue.

> Also, note that https://github.com/kubernetes/kubernetes/pull/100103 should be included in OCP 4.8. If I get around to it, I will try to reproduce this issue on 4.6 and see if I can also reproduce this issue on 4.8 latest. It's not clear to me how that upstream patch would resolve the issue, but I agree that it could perhaps make a difference.

As you mentioned, it may not be related with this issue. It looks like only tracing the same endpointslice generations...
But I think it worth testing, because the fix can change the current endpointslice creation behavior.

[0] endpointslices
$ oc get endpointslices
httpd-4hf9v   IPv4          8443,8080   109m  
httpd-9bw7j   IPv4          8443,8080   6s     <--- Generated after removing "selector" field in the service.

$ oc get endpointslices -o yaml
apiVersion: v1
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1beta1
  - addresses:
      ready: true
      kind: Pod
      name: httpd-7c7ccfffdc-mqps8
      namespace: test
      resourceVersion: "242425"
      uid: 671865b5-f7e9-41fd-911e-cf53b8fc6e2c
      kubernetes.io/hostname: ip-10-0-186-33.ap-northeast-1.compute.internal
      topology.kubernetes.io/region: ap-northeast-1
      topology.kubernetes.io/zone: ap-northeast-1a
  kind: EndpointSlice
      endpoints.kubernetes.io/last-change-trigger-time: "2021-05-18T08:03:23Z"
    creationTimestamp: "2021-05-19T12:45:39Z"
    generateName: httpd-
    generation: 2
      endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
      kubernetes.io/service-name: httpd
      manager: kube-controller-manager
      operation: Update
      time: "2021-05-19T14:30:38Z"
    name: httpd-4hf9v
    namespace: test
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Service
      name: httpd
      uid: 07f8582e-969b-4f52-bd74-f73a56a8bfae
    resourceVersion: "243485"
    selfLink: /apis/discovery.k8s.io/v1beta1/namespaces/test/endpointslices/httpd-4hf9v
    uid: f8b63b4a-fd61-452b-b3b4-0c5e75e6d84a
  - name: 8443-tcp
    port: 8443
    protocol: TCP
  - name: 8080-tcp
    port: 8080
    protocol: TCP
- addressType: IPv4
  apiVersion: discovery.k8s.io/v1beta1
  - addresses:
      ready: true
      kind: Pod
      name: httpd-7c7ccfffdc-mqps8
      namespace: test
      resourceVersion: "242425"
      uid: 671865b5-f7e9-41fd-911e-cf53b8fc6e2c
      kubernetes.io/hostname: ip-10-0-186-33.ap-northeast-1.compute.internal
  kind: EndpointSlice
    creationTimestamp: "2021-05-19T14:34:34Z"
    generateName: httpd-
    generation: 1
      app: httpd
      app.kubernetes.io/component: httpd
      app.kubernetes.io/instance: httpd
      endpointslice.kubernetes.io/managed-by: endpointslicemirroring-controller.k8s.io
      kubernetes.io/service-name: httpd
      manager: kube-controller-manager
      operation: Update
      time: "2021-05-19T14:34:34Z"
    name: httpd-9bw7j
    namespace: test
    - apiVersion: v1
      blockOwnerDeletion: true
      controller: true
      kind: Endpoints
      name: httpd
      uid: 0291165b-95cf-403c-8f06-2e39c1ab81fb
    resourceVersion: "245803"
    selfLink: /apis/discovery.k8s.io/v1beta1/namespaces/test/endpointslices/httpd-9bw7j
    uid: 02c9837a-7568-4d91-9dd0-948c1cc4a07c
  - name: 8443-tcp
    port: 8443
    protocol: TCP
  - name: 8080-tcp
    port: 8080
    protocol: TCP
kind: List
  resourceVersion: ""
  selfLink: ""

$ oc delete endpointslice httpd-4hf9v httpd-9bw7j
endpointslice.discovery.k8s.io "httpd-4hf9v" deleted
endpointslice.discovery.k8s.io "httpd-9bw7j" deleted

$ oc get endpointslice
httpd-qw8k4   IPv4          8443,8080   2s
$ oc get endpointslice httpd-qw8k4 -o yaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1beta1
- addresses:
    ready: true
    kind: Pod
    name: httpd-7c7ccfffdc-mqps8
    namespace: test
    resourceVersion: "242425"
    uid: 671865b5-f7e9-41fd-911e-cf53b8fc6e2c
    kubernetes.io/hostname: ip-10-0-186-33.ap-northeast-1.compute.internal
kind: EndpointSlice
  creationTimestamp: "2021-05-19T14:39:35Z"
  generateName: httpd-
  generation: 1
    app: httpd
    app.kubernetes.io/component: httpd
    app.kubernetes.io/instance: httpd
    endpointslice.kubernetes.io/managed-by: endpointslicemirroring-controller.k8s.io
    kubernetes.io/service-name: httpd
    manager: kube-controller-manager
    operation: Update
    time: "2021-05-19T14:39:35Z"
  name: httpd-qw8k4
  namespace: test
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Endpoints                            <--- Endpoints ownerRef
    name: httpd
    uid: 0291165b-95cf-403c-8f06-2e39c1ab81fb
  resourceVersion: "247199"
  selfLink: /apis/discovery.k8s.io/v1beta1/namespaces/test/endpointslices/httpd-qw8k4
  uid: 9f0ba243-2f3e-4333-b24f-cba8a5491dd5
- name: 8443-tcp
  port: 8443
  protocol: TCP
- name: 8080-tcp
  port: 8080
  protocol: TCP
$ oc get endpoints
NAME    ENDPOINTS                           AGE
httpd,   30h

$ oc get endpointslice
httpd-mk9gg   IPv4          8443,8080   9s
httpd-qw8k4   IPv4          8443,8080   10m
$ oc delete endpointslice httpd-mk9gg httpd-qw8k4
endpointslice.discovery.k8s.io "httpd-mk9gg" deleted
endpointslice.discovery.k8s.io "httpd-qw8k4" deleted

$ oc get endpointslice
httpd-vlw5r   IPv4          8443,8080   2s

$ oc get endpointslice httpd-vlw5r -o yaml
addressType: IPv4
apiVersion: discovery.k8s.io/v1beta1
- addresses:
    ready: true
    kind: Pod
    name: httpd-7c7ccfffdc-mqps8
    namespace: test
    resourceVersion: "242425"
    uid: 671865b5-f7e9-41fd-911e-cf53b8fc6e2c
    kubernetes.io/hostname: ip-10-0-186-33.ap-northeast-1.compute.internal
    topology.kubernetes.io/region: ap-northeast-1
    topology.kubernetes.io/zone: ap-northeast-1a
kind: EndpointSlice
  creationTimestamp: "2021-05-19T14:50:38Z"
  generateName: httpd-
  generation: 1
    endpointslice.kubernetes.io/managed-by: endpointslice-controller.k8s.io
    kubernetes.io/service-name: httpd
    manager: kube-controller-manager
    operation: Update
    time: "2021-05-19T14:50:38Z"
  name: httpd-vlw5r
  namespace: test
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: Service                                          <--- Service ownerRef
    name: httpd
    uid: 07f8582e-969b-4f52-bd74-f73a56a8bfae
  resourceVersion: "250314"
  selfLink: /apis/discovery.k8s.io/v1beta1/namespaces/test/endpointslices/httpd-vlw5r
  uid: e124efcc-a57f-439f-8d31-3ddf5ed4c6f0
- name: 8443-tcp
  port: 8443
  protocol: TCP
- name: 8080-tcp
  port: 8080
  protocol: TCP
$ oc get endpoints
NAME    ENDPOINTS                           AGE
httpd,   30h

Comment 6 Stephen Greene 2021-05-19 19:18:29 UTC
Thanks for sharing your test cases in detail.

I was able to reproduce this issue on 4.8.0-0.ci-2021-05-19-081203 using the following trivial reproducer:

oc new-project test
oc create -f https://raw.githubusercontent.com/openshift/origin/master/examples/hello-openshift/hello-pod.json
oc expose pod/hello-openshift
oc expose service/hello-openshift
oc patch service hello-openshift --patch '{"spec":{"selector":null}}'

observe reload failure in router pod logs

> I also agree with your thought. It's not usual, and to remove "selector" means user should manage their own endpoints by himself. But, this process to enable self-managed endpoints may affect running router like this issue, it's unexpected result for users.
> If possible, we would better enhance the process in the router or providing kind documentations for suppressing unexpected issue.

I agree. I will work with my team to decide the best path forward here and get back to you soon. Thank you for your patience as we work through higher priority issues.

Note that right now in OCP, the customer could take advantage of the following metric:

`template_router_reload_failure` should return `1` if any router pods are currently in a "wedged" state, meaning the router pod cannot reload into the newest HAProxy configuration.

This metric is leveraged by the `HAProxyReloadFail` alert, which should fire if the `template_router_reload_failure` metric is returning `1` for at least 5 minutes. 
So, if the customer is concerned about other edge cases causing issues like this in the future, than can simply monitor these metrics/alerts.


Comment 7 Daein Park 2021-05-19 23:55:11 UTC
> I was able to reproduce this issue on 4.8.0-0.ci-2021-05-19-081203 using the following trivial reproducer:

Thank you for your testing. Unfortunately, the upstream fix was not helpful to fix on this issue.

> This metric is leveraged by the `HAProxyReloadFail` alert, which should fire if the `template_router_reload_failure` metric is returning `1` for at least 5 minutes. 
> So, if the customer is concerned about other edge cases causing issues like this in the future, than can simply monitor these metrics/alerts.

Great suggestion ! It would be helpful for better management of the router configuration !

Thank you again for your information and work !

Comment 8 Stephen Greene 2021-05-20 16:59:00 UTC
spoke with my team and we have decided that it would be wise to merge something along the lines of https://github.com/openshift/router/pull/285.

When I have time I will review the proposed patch and see if there's any performance implications associated with it (or if there's a better way to implement the fix in general).

Comment 10 Arvind iyengar 2021-05-24 10:28:56 UTC
Verified in "4.8.0-0.nightly-2021-05-21-233425" payload. With this release, there are no more router reload errors when the selector is removed for a service mapped to a route:
oc get clusterversion                          
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-05-21-233425   True        False         4h16m   Cluster version is 4.8.0-0.nightly-2021-05-21-233425

oc get all                                        
NAME                  READY   STATUS    RESTARTS   AGE
pod/hello-openshift   1/1     Running   0          93m

NAME                      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/hello-openshift   ClusterIP   <none>        8080/TCP   93m

NAME                                       HOST/PORT                                                             PATH   SERVICES          PORT   TERMINATION   WILDCARD
route.route.openshift.io/hello-openshift   hello-openshift-test1.apps.aiyengar4824.qe.devcluster.openshift.com          hello-openshift   8080                 None

apiVersion: v1
kind: Service
  creationTimestamp: "2021-05-24T08:53:50Z"
    name: hello-openshift
  name: hello-openshift
  namespace: test1
  resourceVersion: "97341"
  uid: 0a4c84e9-af89-46a4-86bd-787a9aaeebb3
  - IPv4
  ipFamilyPolicy: SingleStack
  - port: 8080
    protocol: TCP
    targetPort: 8080
  sessionAffinity: None
  type: ClusterIP
  loadBalancer: {}

oc -n openshift-ingress logs router-default-56b4fbb5ff-f4nrd --tail 50
I0524 05:47:04.382092       1 template.go:433] router "msg"="starting router"  "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: c7b3985da3d1341fdac33f4d6bb6994fe29d32b7\nversionFromGit: 4.0.0-299-gc7b3985d\ngitTreeState: clean\nbuildDate: 2021-05-21T18:48:49Z\n"
I0524 05:47:04.383766       1 metrics.go:155] metrics "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"=""
I0524 05:47:04.388599       1 router.go:191] template "msg"="creating a new template router"  "writeDir"="/var/lib/haproxy"
I0524 05:47:04.388679       1 router.go:270] template "msg"="router will coalesce reloads within an interval of each other"  "interval"="5s"
I0524 05:47:04.389386       1 router.go:332] template "msg"="watching for changes"  "path"="/etc/pki/tls/private"
I0524 05:47:04.389442       1 router.go:262] router "msg"="router is including routes in all namespaces"  
E0524 05:47:04.494692       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I0524 05:47:04.530597       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 05:47:09.521271       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 05:47:35.597216       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 05:47:40.569203       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 05:47:45.606777       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 05:54:19.025876       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 05:54:35.750379       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 05:55:09.203483       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 05:55:14.196201       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 06:07:52.951984       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 06:08:25.395947       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 06:08:30.377415       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I0524 06:08:55.400129       1 router.go:579] template "msg"="router reloaded"  "output"=" - Proxy protocol on, checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"

Comment 14 Stephen Greene 2021-07-08 23:07:07 UTC
xref https://github.com/kubernetes/kubernetes/issues/103576

Comment 16 errata-xmlrpc 2021-07-27 23:08:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Comment 17 Jie Wu 2021-09-14 04:35:15 UTC
The customer has the same error in OCP4.7.13 environment.

The fix commit has been added on 2021-5-22.

And I confirmed that openshift4/ose-haproxy-router 4.7.16(buildDate: 2021-06-03T23:22:11Z) or later has been fixed this issue.

OpenShift Container Platform 4.7.16 container image list

# podman run -it --entrypoint=/usr/bin/openshift-router  openshift4/ose-haproxy-router:v4.7.0-202106032231.p0.git.5a0e656 version

commitFromGit: 5a0e6561b0480df9f32a8ef87a54a1dc4cf91b93
versionFromGit: 4.0.0-268-g5a0e6561
gitTreeState: clean
buildDate: 2021-06-03T23:22:11Z

