Bug 1896914 - Route with `haproxy.router.openshift.io/timeout: 365d` kills the ingress controller
Summary: Route with `haproxy.router.openshift.io/timeout: 365d` kills the ingress cont...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Routing
Version: 4.4
Hardware: Unspecified
OS: Unspecified
medium
high
Target Milestone: ---
: 4.5.z
Assignee: Stephen Greene
QA Contact: Arvind iyengar
URL:
Whiteboard:
Depends On: 1896905
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-11-11 20:06 UTC by Stephen Greene
Modified: 2021-11-02 15:42 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1896905
Environment:
Last Closed: 2020-12-15 20:28:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift router pull 218 0 None closed [release-4.5] Bug 1896914: Back port Clip haproxy.router.openshift.io/timeout annotation values to prevent bricking on u... 2021-01-17 13:30:15 UTC
Red Hat Product Errata RHSA-2020:5359 0 None None None 2020-12-15 20:29:12 UTC

Comment 2 Arvind iyengar 2020-11-25 08:48:29 UTC
Verified in "4.5.0-0.ci.test-2020-11-25-061734-ci-ln-g7zsxx2" CI image. With patch, it is noted the "timer overflow" does not occur and cause any disruption for router restarts:
----
$ oc get clusterversion
NAME      VERSION                                           AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.5.0-0.ci.test-2020-11-25-061734-ci-ln-g7zsxx2   True        False         83m     Cluster version is 4.5.0-0.ci.test-2020-11-25-061734-ci-ln-g7zsxx2

Route with a very large timeout annotation:
$ oc annotate route service-unsecure haproxy.router.openshift.io/timeout=9999d  <---
$ oc describe route service-unsecure
Name:			service-unsecure
Namespace:		test1
Created:		26 minutes ago
Labels:			name=service-unsecure
Annotations:		haproxy.router.openshift.io/timeout=9999d <---

In the haproxy configuration: 

backend be_http:test1:service-unsecure
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  2147483647ms <--- [the number being rounded off to ~ 24.85 days

The router could be seen running without any errors post restarts unlike the older versions where the restart loop would have triggered:

$ oc -n openshift-ingress get pods -o wide
NAME                                   READY   STATUS    RESTARTS   AGE   IP            NODE                                                NOMINATED NODE   READINESS GATES
router-default-676754f5c4-gkw2m        1/1     Running   0          96m   10.129.2.3    ci-ln-g7zsxx2-002ac-qnx24-worker-centralus2-9dbkk   <none>           <none>
router-default-676754f5c4-p6qls        1/1     Running   0          96m   10.131.0.11   ci-ln-g7zsxx2-002ac-qnx24-worker-centralus3-k4zx9   <none>           <none>
router-internalapps-6569bb474b-svlsj   2/2     Running   0          74s   10.129.2.9    ci-ln-g7zsxx2-002ac-qnx24-worker-centralus2-9dbkk   <none>           <none>

$ oc -n openshift-ingress logs router-internalapps-6569bb474b-svlsj -c router --tail 50
I1125 08:28:37.868271       1 template.go:298] router "msg"="starting router"  "version"="majorFromGit: \nminorFromGit: \ncommitFromGit: adc7d59\nversionFromGit: v0.0.0-unknown\ngitTreeState: clean\nbuildDate: 2020-11-25T06:16:03Z\n"
I1125 08:28:37.870356       1 metrics.go:154] metrics "msg"="router health and metrics port listening on HTTP and HTTPS"  "address"="0.0.0.0:1936"
I1125 08:28:37.877969       1 router.go:164] template "msg"="creating a new template router"  "writeDir"="/var/lib/haproxy"
I1125 08:28:37.878036       1 router.go:239] template "msg"="router will coalesce reloads within an interval of each other"  "interval"="5s"
I1125 08:28:37.878550       1 router.go:301] template "msg"="watching for changes"  "path"="/etc/pki/tls/private"
I1125 08:28:37.878626       1 router.go:257] router "msg"="router is including routes in all namespaces"  
I1125 08:28:37.878717       1 reflector.go:175] Starting reflector *v1.Service (30m0s) from github.com/openshift/router/pkg/router/template/service_lookup.go:33
I1125 08:28:37.879889       1 reflector.go:175] Starting reflector *v1.Route (30m0s) from github.com/openshift/router/pkg/router/controller/factory/factory.go:116
I1125 08:28:37.879951       1 reflector.go:175] Starting reflector *v1.Endpoints (30m0s) from github.com/openshift/router/pkg/router/controller/factory/factory.go:116
E1125 08:28:37.986820       1 haproxy.go:418] can't scrape HAProxy: dial unix /var/lib/haproxy/run/haproxy.sock: connect: no such file or directory
I1125 08:28:38.021164       1 router.go:536] template "msg"="router reloaded"  "output"="[ALERT] 329/082837 (22) : sendmsg()/writev() failed in logger #1: No such file or directory (errno=2)\n - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I1125 08:28:43.029006       1 router.go:536] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I1125 08:28:55.589888       1 router.go:536] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
I1125 08:29:00.591829       1 router.go:536] template "msg"="router reloaded"  "output"=" - Checking http://localhost:80 ...\n - Health check ok : 0 retry attempt(s).\n"
----

Comment 7 errata-xmlrpc 2020-12-15 20:28:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.5.23 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5359


Note You need to log in before you can comment on or make changes to this bug.